Re: Large files.

2004-12-15 Thread Simon Pepping
On Mon, Dec 13, 2004 at 02:26:40PM -0700, Victor Mote wrote:
 Simon Pepping wrote:
 
  The code you presented seems to be an algorithm implementing 
  an iterator over a tree. Because it maintains its state, it 
  can be stopped and resumed at will, provided you keep a 
  reference to it.
 
  If LMIter would have a reference to its parent LMIter, and 
  could return to it after having processed all the siblings, 
  it would realize such a construct. One could stop the 
  iteration, retain a reference to the active LMIter, and resume later.
  
  Not being dependent on the Java stack would make the 
  programming much more robust. One would have more freedom to 
  insert actions, without the fear to lose the iteration state 
  if one would not carefully return to the same function.
  
  Such a construct would be equally suitable to pull parsing. 
  The LMIter call to the LM method preLoadNext would request 
  more child fo nodes, which the parser would provide on this demand.
  
  Do you want to traverse the FO tree, without relying on the 
  Java stack, and drive the layout process from there by firing 
  node events?
 
 Hi Simon:
 
 You responded to my last posting in this thread, but your questions seem to
 be directed to Finn, so I will let him answer them. Please let me know if I
 have misunderstood.

Victor,

You are right. I replied to the last message in the thread, but it
contains questions to Finn. Sorry for being unclear.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



RE: Large files.

2004-12-13 Thread Victor Mote
Finn Bock wrote:

 Did you notice that if a FOTree (or a fragment of it) is 
 serialized to a preorder sequential representation with end 
 markers, the preorder, postorder and child events can be 
 fired directly from the input stream?
 
 IOW the event based layout can work both of a normal 
 parent/children linked tree and a sequential tree.

Hmmm. I must have totally misunderstood your original point, which I think
you expressed much better in your second paragraph above. I certainly didn't
mean to argue against event-based layout, which, in general, I support, but
rather against the idea that an FOTree node can necessarily be laid out when
it is first encountered. And I think I understand now why you have done the
massive propagation of properties -- am I correct in understanding that you
are essentially flattening the tree so that inheritance must be captured
before that flattening takes place? Or are you simply making that an option
now?

  The bottom line is that, if you
  conceivably need to see both the beginning and end of the input 
  simultaneously (which we do),
 
 I can see a need for several passes over the same tree 
 fragment, but do we really need to see the beginning and the 
 end at the same time?
 
 The auto table example appears to need 2 or 3 sequential 
 passes over the table subtree (one to find the 
 minimum/maximum column width and one to do the actual 
 layout), but the layout process is still sequential.

Perhaps we have a semantical misunderstanding here. By see, I mean that we
know what is in it, and a multiple-pass solution can accomplish that. I'm
sure I acknowledged that as one of the two possibly ways that I could think
of to accomplish the end.

Victor Mote



RE: Large files.

2004-12-13 Thread Victor Mote
Simon Pepping wrote:

 The code you presented seems to be an algorithm implementing 
 an iterator over a tree. Because it maintains its state, it 
 can be stopped and resumed at will, provided you keep a 
 reference to it.

 If LMIter would have a reference to its parent LMIter, and 
 could return to it after having processed all the siblings, 
 it would realize such a construct. One could stop the 
 iteration, retain a reference to the active LMIter, and resume later.
 
 Not being dependent on the Java stack would make the 
 programming much more robust. One would have more freedom to 
 insert actions, without the fear to lose the iteration state 
 if one would not carefully return to the same function.
 
 Such a construct would be equally suitable to pull parsing. 
 The LMIter call to the LM method preLoadNext would request 
 more child fo nodes, which the parser would provide on this demand.
 
 On Mon, Dec 13, 2004 at 08:29:43AM -0700, Victor Mote wrote:
  Finn Bock wrote:
  
   Did you notice that if a FOTree (or a fragment of it) is 
 serialized 
   to a preorder sequential representation with end markers, the 
   preorder, postorder and child events can be fired 
 directly from the 
   input stream?
   
   IOW the event based layout can work both of a normal 
 parent/children 
   linked tree and a sequential tree.
  
  Hmmm. I must have totally misunderstood your original 
 point, which I 
  think you expressed much better in your second paragraph above. I 
  certainly didn't mean to argue against event-based layout, 
 which, in 
  general, I support, but rather against the idea that an FOTree node 
  can necessarily be laid out when it is first encountered. 
 And I think 
  I understand now why you have done the massive propagation of 
  properties -- am I correct in understanding that you are 
 essentially 
  flattening the tree so that inheritance must be captured 
 before that 
  flattening takes place? Or are you simply making that an option now?
 
 Do you want to traverse the FO tree, without relying on the 
 Java stack, and drive the layout process from there by firing 
 node events?

Hi Simon:

You responded to my last posting in this thread, but your questions seem to
be directed to Finn, so I will let him answer them. Please let me know if I
have misunderstood.

Victor Mote



Re: Large files.

2004-12-13 Thread Simon Pepping
The code you presented seems to be an algorithm implementing an
iterator over a tree. Because it maintains its state, it can be
stopped and resumed at will, provided you keep a reference to it.

If LMIter would have a reference to its parent LMIter, and could
return to it after having processed all the siblings, it would realize
such a construct. One could stop the iteration, retain a reference to
the active LMIter, and resume later.

Not being dependent on the Java stack would make the programming
much more robust. One would have more freedom to insert actions,
without the fear to lose the iteration state if one would not
carefully return to the same function.

Such a construct would be equally suitable to pull parsing. The LMIter
call to the LM method preLoadNext would request more child fo nodes,
which the parser would provide on this demand.

On Mon, Dec 13, 2004 at 08:29:43AM -0700, Victor Mote wrote:
 Finn Bock wrote:
 
  Did you notice that if a FOTree (or a fragment of it) is 
  serialized to a preorder sequential representation with end 
  markers, the preorder, postorder and child events can be 
  fired directly from the input stream?
  
  IOW the event based layout can work both of a normal 
  parent/children linked tree and a sequential tree.
 
 Hmmm. I must have totally misunderstood your original point, which I think
 you expressed much better in your second paragraph above. I certainly didn't
 mean to argue against event-based layout, which, in general, I support, but
 rather against the idea that an FOTree node can necessarily be laid out when
 it is first encountered. And I think I understand now why you have done the
 massive propagation of properties -- am I correct in understanding that you
 are essentially flattening the tree so that inheritance must be captured
 before that flattening takes place? Or are you simply making that an option
 now?

Do you want to traverse the FO tree, without relying on the Java
stack, and drive the layout process from there by firing node events?
 
Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Large files.

2004-12-12 Thread Peter B. West
Finn Bock wrote:

The loop can be stopped when we temporary run out of FO tree nodes 
and restarted again when new nodes has been added. I suppose that the 
FO tree can then be viewed as a stream of FO nodes.

[Victor]
This model probably works fine if you never need to look ahead, but there
are numerous examples (well discussed in the archives) where one does 
need
to do that, the most obvious being automatic table layout. Peter's 
solution
to that is pull parsing, which probably works, but forces an 
intermingling
of interests that I need to avoid. My solution is to serialize the 
FOTree as
necessary 

Did you notice that if a FOTree (or a fragment of it) is serialized to a 
preorder sequential representation with end markers, the preorder, 
postorder and child events can be fired directly from the input stream?
Which is what Defoe does.  Now let go of the notion of firing events, 
and we are in agreement.  The SAX model is not relevant once the basic 
parsing is done.  With a full-fledged stream parser, it won't be 
relevant at all.

IOW the event based layout can work both of a normal parent/children 
linked tree and a sequential tree.

Peter


Re: Large files.

2004-12-12 Thread Finn Bock

The loop can be stopped when we temporary run out of FO tree 
nodes and restarted again when new nodes has been added. I 
suppose that the FO tree can then be viewed as a stream of FO nodes.
[Victor]
This model probably works fine if you never need to look ahead, but there
are numerous examples (well discussed in the archives) where one does need
to do that, the most obvious being automatic table layout. Peter's solution
to that is pull parsing, which probably works, but forces an intermingling
of interests that I need to avoid. My solution is to serialize the FOTree as
necessary 
Did you notice that if a FOTree (or a fragment of it) is serialized to a 
preorder sequential representation with end markers, the preorder, 
postorder and child events can be fired directly from the input stream?

IOW the event based layout can work both of a normal parent/children 
linked tree and a sequential tree.

The bottom line is that, if you
conceivably need to see both the beginning and end of the input
simultaneously (which we do), 
I can see a need for several passes over the same tree fragment, but do
we really need to see the beginning and the end at the same time?
The auto table example appears to need 2 or 3 sequential passes over the 
table subtree (one to find the minimum/maximum column width and one to 
do the actual layout), but the layout process is still sequential.

regards,
finn


Re: Large files.

2004-12-10 Thread Finn Bock
The problem with Keirons layout code (with respect to large input 
files) is that it works top-down on the LM tree and thus require the 
creating of the complete LM tree for the page sequence. To better fit 
within SAX event model the layout process should also be event driven, 
triggered f.ex by the endBlock() and character() events. That means 
that the traversing of the FOTree cannot be easily done using 
recursive decend. Instead the main loop over the FO tree could use a 
non-recusive tree treverse like this:

public Element traverse(Node q, Node end, int mode) {
while (q != end) {
switch (mode) {
case FIRST:
// First time in the node.
preorder(q);
if (q.hasChildNodes()) {
q = q.getFirstChild();
break;
}
mode = NEXT;
// fallthrough
case NEXT:
// All children are handled.
postorder(q);
// Add the node to its parent.
if (q.getParentNode() != null) {
child(q.getParentNode(), q);
}
// fallthrough
case RESTART:
// Attempt to move vertically to next sibling.
 horizontally?
if (q.getNextSibling() != null) {
q = q.getNextSibling();
mode = FIRST;
break;
}
// No sibling found, move to the parent to
// do postorder.
q = q.getParentNode();
mode = NEXT;
}
}
return null;
}
...
The loop can be stopped when we temporary run out of FO tree nodes and 
restarted again when new nodes has been added. I suppose that the FO 
tree can then be viewed as a stream of FO nodes.
[Peter]
I suppose so.  And I suppose that making layout event-driven would 
better fit in with the SAX event model.  It just doesn't make sense from 
the point of view of layout which is basically a top-down process.
By top-down, do you mean the visiting order of the nodes in the tree? Or 
the structure of the code that does the layout? Like this:

process(Node n) {
preorder(n)
for (each child in n) {
process(child)
child(n, child);
}
postorder(n);
}
The traverse() code does the exact same as the process() code. The 
differences are:

- All people immediately recoqnize process().
- process() can use local variables to store state.
A major drawback of process() is that is darn hard to suspend processing 
and then restart it. Our current getNextBreakPoss() code is a nice 
example of just how hard it is.

regards,
finn


Re: Large files.

2004-12-10 Thread Peter B. West
Finn Bock wrote:
The loop can be stopped when we temporary run out of FO tree nodes 
and restarted again when new nodes has been added. I suppose that the 
FO tree can then be viewed as a stream of FO nodes.

[Peter]
I suppose so.  And I suppose that making layout event-driven would 
better fit in with the SAX event model.  It just doesn't make sense 
from the point of view of layout which is basically a top-down process.

By top-down, do you mean the visiting order of the nodes in the tree? Or 
the structure of the code that does the layout? Like this:

process(Node n) {
preorder(n)
for (each child in n) {
process(child)
child(n, child);
}
postorder(n);
}
The traverse() code does the exact same as the process() code.
I noticed that, and wondered what you were getting at.
The 
differences are:

- All people immediately recoqnize process().
- process() can use local variables to store state.
A major drawback of process() is that is darn hard to suspend processing 
and then restart it. Our current getNextBreakPoss() code is a nice 
example of just how hard it is.
The penny drops.  There has always been a requirement for some form of 
co-routine, and I see now what you are getting at.  I think I see why 
you mentioned this in the same breath as the SAX event model, but that 
confuses the issue, I think.  What you are proposing is compatible with 
stream parsing, and in fact, fits quite well.  Alt-design parsing 
provides on-demand events to the FO tree builder, and is automatically 
regulated by up-stream demand.

It seems to me that the events most natural to the layout process are 
events in the FO tree, and that the main requirement for restarts in 
the FO tree is to do with the inherent dialogue between FO and Area 
trees.  However, the discussion is too general, and my understanding of 
the particular problems you are addressing is too vague, for me to make 
meaningful comments.

Peter


Re: Large files.

2004-12-10 Thread Peter B. West
Victor Mote wrote:
Finn Bock wrote:

The loop can be stopped when we temporary run out of FO tree 
nodes and restarted again when new nodes has been added. I 
suppose that the FO tree can then be viewed as a stream of FO nodes.

This model probably works fine if you never need to look ahead, but there
are numerous examples (well discussed in the archives) where one does need
to do that, the most obvious being automatic table layout. Peter's solution
to that is pull parsing, which probably works, but forces an intermingling
of interests that I need to avoid. My solution is to serialize the FOTree as
necessary (I do not have this working). The bottom line is that, if you
conceivably need to see both the beginning and end of the input
simultaneously (which we do), and you are writing so that disk space is the
only constraining factor, you will have to either be prepared to re-parse
the data (Peter's approach) or to serialize what has already been parsed (my
approach).
Victor, I think you may have misinterpreted an aspect of Defoe's design. 
 The re-parsing (of attribute data) is only required for static-content 
and markers.  Even then, it is not essential, merely the simplest way to 
achieve the result, given a stream of pre-parsed (in SAX terms) events. 
 I'm quite happy with serializing the partial results where rendering 
cannot be resolved due to forward references.  I don't see auto table 
layout and other localized look-ahead requiring this.

I have never been able to see a third alternative. But I am
always open to new ideas. I rather suspect that the current FOP design is
oriented toward common use-cases rather than a comprehensive treatment that
will handle all cases.
Peter


RE: Large files.

2004-12-10 Thread Victor Mote
Peter B. West wrote:

 Victor, I think you may have misinterpreted an aspect of 
 Defoe's design. 
   The re-parsing (of attribute data) is only required for 
 static-content and markers.  Even then, it is not essential, 
 merely the simplest way to achieve the result, given a stream 
 of pre-parsed (in SAX terms) events. 
   I'm quite happy with serializing the partial results where 
 rendering cannot be resolved due to forward references.  I 
 don't see auto table layout and other localized look-ahead 
 requiring this.

Now that you mention it, I recall you saying something about a temporary
data structure (IIRC), to which I replied sounds like an FO Tree, to
which I never received a substantive reply. My apologies -- I didn't mean to
misrepresent Defoe.

I have some other comments on the table layout and look-ahead issues, but
I'll make them to you off-line, because I think the FOP folks don't want the
design conversations here.

My apologies for my part in starting this thread. Simon's original comment
could be interpreted in either a general or specific way, and I just wanted
to clarify that aspect of it. I didn't mean to start a debate at all.

Victor Mote


Re: Large files.

2004-12-10 Thread Peter B. West
Finn Bock wrote:
...
The problem with Keirons layout code (with respect to large input files) 
is that it works top-down on the LM tree and thus require the creating 
of the complete LM tree for the page sequence. To better fit within SAX 
event model the layout process should also be event driven, triggered 
f.ex by the endBlock() and character() events. That means that the 
traversing of the FOTree cannot be easily done using recursive decend. 
Instead the main loop over the FO tree could use a non-recusive tree 
treverse like this:

public Element traverse(Node q, Node end, int mode) {
while (q != end) {
switch (mode) {
case FIRST:
// First time in the node.
preorder(q);
if (q.hasChildNodes()) {
q = q.getFirstChild();
break;
}
mode = NEXT;
// fallthrough
case NEXT:
// All children are handled.
postorder(q);
// Add the node to its parent.
if (q.getParentNode() != null) {
child(q.getParentNode(), q);
}
// fallthrough
case RESTART:
// Attempt to move vertically to next sibling.
 horizontally?
if (q.getNextSibling() != null) {
q = q.getNextSibling();
mode = FIRST;
break;
}
// No sibling found, move to the parent to
// do postorder.
q = q.getParentNode();
mode = NEXT;
}
}
return null;
}
...
The loop can be stopped when we temporary run out of FO tree nodes and 
restarted again when new nodes has been added. I suppose that the FO 
tree can then be viewed as a stream of FO nodes.
I suppose so.  And I suppose that making layout event-driven would 
better fit in with the SAX event model.  It just doesn't make sense from 
the point of view of layout which is basically a top-down process.

BTW, datatypes.Node has getPrecedingSibling(), getFollowingSibling() and 
many other doodads.  First code I ever wrote for FOP.

Peter


Large files.

2004-12-09 Thread Finn Bock
[Peter]
As a completely unintentional side-effect, it gave me the tools to solve 
the really critical FOP performance problem on large files.  This isn't 
going to be solved by micro-efficiencies on FO tree storage.
I'm going to use this as a excuse to outline my thinking for handling 
large input files. I'm almost sure that it is a bit different from your 
view.

This is a rant, and I don't have a working patch to back it up.
The problem with Keirons layout code (with respect to large input files) 
is that it works top-down on the LM tree and thus require the creating 
of the complete LM tree for the page sequence. To better fit within SAX 
event model the layout process should also be event driven, triggered 
f.ex by the endBlock() and character() events. That means that the 
traversing of the FOTree cannot be easily done using recursive decend. 
Instead the main loop over the FO tree could use a non-recusive tree 
treverse like this:

public Element traverse(Node q, Node end, int mode) {
while (q != end) {
switch (mode) {
case FIRST:
// First time in the node.
preorder(q);
if (q.hasChildNodes()) {
q = q.getFirstChild();
break;
}
mode = NEXT;
// fallthrough
case NEXT:
// All children are handled.
postorder(q);
// Add the node to its parent.
if (q.getParentNode() != null) {
child(q.getParentNode(), q);
}
// fallthrough
case RESTART:
// Attempt to move vertically to next sibling.
if (q.getNextSibling() != null) {
q = q.getNextSibling();
mode = FIRST;
break;
}
// No sibling found, move to the parent to
// do postorder.
q = q.getParentNode();
mode = NEXT;
}
}
return null;
}
This is the algorithm from Knuth TAoCP, vol I, Page 323 program S. It 
traverses any DOM tree fragment from 'q' to 'end' without using the java 
stack. The tree must support getFirstChild() and getNextSibling(). 
During the traverse it fires 'preorder', 'postorder' and 'child' events.

When applied to FOP the nodes are LayoutManagers and getFirstChild() and 
getNextSibling() will create the LayoutManager for the first child / 
next sibling.

The loop can be stopped when we temporary run out of FO tree nodes and 
restarted again when new nodes has been added. I suppose that the FO 
tree can then be viewed as a stream of FO nodes.

regards,
finn


RE: Large files.

2004-12-09 Thread Victor Mote
Finn Bock wrote:

 The loop can be stopped when we temporary run out of FO tree 
 nodes and restarted again when new nodes has been added. I 
 suppose that the FO tree can then be viewed as a stream of FO nodes.

This model probably works fine if you never need to look ahead, but there
are numerous examples (well discussed in the archives) where one does need
to do that, the most obvious being automatic table layout. Peter's solution
to that is pull parsing, which probably works, but forces an intermingling
of interests that I need to avoid. My solution is to serialize the FOTree as
necessary (I do not have this working). The bottom line is that, if you
conceivably need to see both the beginning and end of the input
simultaneously (which we do), and you are writing so that disk space is the
only constraining factor, you will have to either be prepared to re-parse
the data (Peter's approach) or to serialize what has already been parsed (my
approach). I have never been able to see a third alternative. But I am
always open to new ideas. I rather suspect that the current FOP design is
oriented toward common use-cases rather than a comprehensive treatment that
will handle all cases.

Victor Mote



processing large files

2004-09-21 Thread Vanderhaeghen Christophe
Hi,

At Volvo we implemented the FOP-mechanism for rendering mainframe lists into 
PDF-documents.
On the mainframe we build a valid xml-file. this file is sent to the java-server, 
which will validate, parse and convert the document into 1 or more PDF documents.
This is a very very short explanation of what the program does :-)

The mechanism works fine for us, but there's one major problem. One of our lists is 
over 100 mb and it requires over 1 gb memory to process via FOP.
We've searched everything to make it more performant, but everytime the FOP starts the 
memory that is consumed is huge!

On the website I saw that you are working on a more performant version of FOP, but 
that you don't know when it will be released (2003).
We are almost at the end of 2004 and we are very interested if you know when it might 
be released? Solutions or temporary tips might also be very usefull!
For the smaller files (up to 50 megabytes) FOP works fine, but processing large files 
isn't very performant.

Thanks for your help and time,

Christophe Vanderhaeghen
analist - programmer
_
Christophe Vanderhaeghen

Volvo IT Belgium
AS-OTD (location BTE)

Telephone: 09/250.42.26 
E-mail: [EMAIL PROTECTED]