Re: Page breaking [was: Markers added to the wrong page]
[The Web Maestro] I don't know how the spec deals with this, but I doubt the spec cares which algorithm is used. That said, would it be a good idea to determine which algorithm to use based on something in userconfig.xml or something? If the Knuth system is more 'expensive' in terms of resources, we could make non-Knuth the default, and enable Knuth via a config file. Yes indeed. But without measurements I would guess that knuth page breaking would be far less expensive than the knuth line breaking. The line breaking has to deal with far more elements and because of that a far larger set of active nodes. But clearly a choice will be good, both for page and for line breaking. [Victor] The good news is that this is an excellent idea. The bad news is that FOP's determination to not even tolerate more than one algorithm is what drove FOP and FOray apart. So you have stumbled in to a somewhat sensitive area, which might explain the absence of responses (so far). BTW, since it is relevant here, thanks for your kind comments last week in another thread. You are ever the peacemaker, a very useful role. As long as the differences remain as great as they are, we *need* more than one FO implementation. In the meantime, I am much more useful and productive outside of FOP than I am in it, and, the FOP developers probably are also. To my perhaps still naive mind, layout consists of exactly two tasks: 1) finding line-breaks within logical blocks (nested blocks consisting of a new logical block), and 2) finding page-breaks between and within logical blocks. All other tasks can be conveniently handled within the two data structures, not being dependent on any strategy. The Knuth-style algorithm is by definition a total-fit algorithm. The scope of such an algorithm for the first task is the logical block itself. The scope of the second task is the page-sequence. So, to implement a Knuth-style algorithm (itself a good idea) for item 2, requires the ability to see an entire page-sequence in the AreaTree at one time. Not at all. It is a rather trivial change to knuth to pick a page break when there is N pages of lookahead. If we assume that finished page knuth element arrive one at time to the KnuthPage algorithm, the main loop becomes: addKnuthElement(element) { if element.isBox(): totalWidth += element.width else if element.isGlue(): if (previous element is box): considerLegalBreak() totalWidth += element.width totalStretch += element.stretch totalShrink += element.shrink else if element.isPenalty(): if element.penalty INFINITE: considerLegalBreak() if activeList.startLine currentPage + lookahead: // In the activeList, find the node with the best demerit. bestNode = findBestDemerit() // Remove all nodes in activeList which does not have bestNode // in their parentage. clearActiveList(bestNode) makePageBreak(bestNode) Here it is only clearActiveList() which does not have a fast implementation when using the usual implementation of ActiveNode. It will require a scan over all the active node and for each node a scan back thru the previous links to the node at currentPage+1. clearActiveList(bestNode): for node in activeList: // jump back thru previous (node.page-currentPage) times. while i++ node.page - currentPage previous = node.previous if previous != bestNode: activeList.remove(node) The rest of the methods called are similar to the methods in my knuth line breaking patch (which is quite similar the current implementation). My own insecurity comes from figuring out which penalty values and demerit algorihtm to use to get keeps and orphans right. regards, finn
RE: Page breaking [was: Markers added to the wrong page]
Finn Bock wrote: Not at all. It is a rather trivial change to knuth to pick a page break when there is N pages of lookahead. If we assume that finished page knuth element arrive one at time to the KnuthPage algorithm, the main loop becomes: addKnuthElement(element) { if element.isBox(): totalWidth += element.width else if element.isGlue(): if (previous element is box): considerLegalBreak() totalWidth += element.width totalStretch += element.stretch totalShrink += element.shrink else if element.isPenalty(): if element.penalty INFINITE: considerLegalBreak() if activeList.startLine currentPage + lookahead: // In the activeList, find the node with the best demerit. bestNode = findBestDemerit() // Remove all nodes in activeList which does not have bestNode // in their parentage. clearActiveList(bestNode) makePageBreak(bestNode) Here it is only clearActiveList() which does not have a fast implementation when using the usual implementation of ActiveNode. It will require a scan over all the active node and for each node a scan back thru the previous links to the node at currentPage+1. clearActiveList(bestNode): for node in activeList: // jump back thru previous (node.page-currentPage) times. while i++ node.page - currentPage previous = node.previous if previous != bestNode: activeList.remove(node) The rest of the methods called are similar to the methods in my knuth line breaking patch (which is quite similar the current implementation). Hmmm. I guess it depends on where the N in N pages of lookahead comes from. If that can be set to include the entire page-sequence, then you can call it a Knuth-style algorithm. Otherwise, I think you have something less, which is OK for those who don't want the performance hit. My own insecurity comes from figuring out which penalty values and demerit algorihtm to use to get keeps and orphans right. Yes. Fortunately, the XSL-FO standard provides some guidance here, but I agree that this part will be a challenge. Good luck. I am glad to see you guys heading down this path. Let me know if there is anything I can do to help. Victor Mote
Re: Page breaking [was: Markers added to the wrong page]
Finn Bock wrote: I would pass the element on the immidiate parent, which recursively passes them on the top-level LM in the direction. For inline, the toplevel would be LineLM and for blocks it would be the PageLM. Ok, I misunderstood what you wrote, now I think we were saying the same thing using different words! :-) Regards Luca
Re: Page breaking [was: Markers added to the wrong page]
[Luca] I'm not sure it is always possible to do this: sometimes the representation of a block depends on the properties of a higher level block. For example: outer block | - | | innerinner blockblock 12 In order to decide whether there can be a page break between the lines generated by innerBlock1 and innerBlock2, we must know: - if inner block 1 has keep-with-next - if inner block 2 has keep-with-previous - if outer block has keep-together This can be done if the outer BlockLM calls receives the elements created by its child, looks at them and adds / removes / corrects; could this be done if the inner BlockLMs directly give their element to the top-level LM? I would pass the element on the immidiate parent, which recursively passes them on the top-level LM in the direction. For inline, the toplevel would be LineLM and for blocks it would be the PageLM. BTW, what about your great refactoring of all the knuth code? I do not have the time atm to do the followup and bugfixes, so it will have to wait. If anyone also wants to commit it, feel free. regards, finn
Re: Page breaking [was: Markers added to the wrong page]
On Feb 7, 2005, at 7:21 AM, Luca Furini wrote: Finn Bock wrote: Using your description as a jumping point, here is my ideas for page breaking. I suppose it is even more pie-in-the-sky since I haven't yet written anything about it. As I have been doing a few experiments about page breaking, I'm happy to join in this conversation. I rarely find myself feeling the urge to enter these conversations (it's not really my forte), but every now and then, I find I might offer an idea or two I think (hope) will be constructive... The algorithm that the PageLM uses are a slightly modified knuth (no need to maintain fitnessclass, and with the ability to decide on a break when there is N pages of lookahead). The elements return from the LMs are boxes (for lines), spaces and penalties. Note that using the box - penalty - glue representation does not necessarily involve using Knuth's algorithm. A first-fit (or best-fit) algorithm could be enough in most situations; and if there are no adjustable elements in the fo document (for exaple, lots of paragraphs without adjustable spaces between them) Knuth's algorithm becomes a best-fit. Given its higher cost, maybe it could be used only when it could really produce a better output. I don't know how the spec deals with this, but I doubt the spec cares which algorithm is used. That said, would it be a good idea to determine which algorithm to use based on something in userconfig.xml or something? If the Knuth system is more 'expensive' in terms of resources, we could make non-Knuth the default, and enable Knuth via a config file. The elements are not returned from the LMs but pushed from the LM into the pageLM: parent.generateElement(new Space(resolveBefore()); parent.generateElement(new Box(lineHeigth); I'm not sure it is always possible to do this: sometimes the representation of a block depends on the properties of a higher level block. For example: outer block | - | | innerinner blockblock 12 In order to decide whether there can be a page break between the lines generated by innerBlock1 and innerBlock2, we must know: - if inner block 1 has keep-with-next - if inner block 2 has keep-with-previous - if outer block has keep-together This can be done if the outer BlockLM calls receives the elements created by its child, looks at them and adds / removes / corrects; could this be done if the inner BlockLMs directly give their element to the top-level LM? BTW, what about your great refactoring of all the knuth code? Regards, Luca Cheers! Web Maestro Clay -- [EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/ My religion is simple. My religion is kindness. - HH The 14th Dalai Lama of Tibet
RE: Page breaking [was: Markers added to the wrong page]
The Web Maestro wrote: I don't know how the spec deals with this, but I doubt the spec cares which algorithm is used. That said, would it be a good idea to determine which algorithm to use based on something in userconfig.xml or something? If the Knuth system is more 'expensive' in terms of resources, we could make non-Knuth the default, and enable Knuth via a config file. Hi Clay: The good news is that this is an excellent idea. The bad news is that FOP's determination to not even tolerate more than one algorithm is what drove FOP and FOray apart. So you have stumbled in to a somewhat sensitive area, which might explain the absence of responses (so far). BTW, since it is relevant here, thanks for your kind comments last week in another thread. You are ever the peacemaker, a very useful role. As long as the differences remain as great as they are, we *need* more than one FO implementation. In the meantime, I am much more useful and productive outside of FOP than I am in it, and, the FOP developers probably are also. To my perhaps still naive mind, layout consists of exactly two tasks: 1) finding line-breaks within logical blocks (nested blocks consisting of a new logical block), and 2) finding page-breaks between and within logical blocks. All other tasks can be conveniently handled within the two data structures, not being dependent on any strategy. The Knuth-style algorithm is by definition a total-fit algorithm. The scope of such an algorithm for the first task is the logical block itself. The scope of the second task is the page-sequence. So, to implement a Knuth-style algorithm (itself a good idea) for item 2, requires the ability to see an entire page-sequence in the AreaTree at one time. I am a little confused about whether FOP's design will even allow that to happen right now. I understood both Keiron's and Finn's design ideas (and Peter West's also) to be releasing pages from the AreaTree long before a page-sequence was even completely parsed. However, I may have misunderstood, especially Finn's, which I am especially fuzzy on. I read somewhere that Peter Karow was going to try to implement a Knuth-style algorithm for URW, and I have tried desperately and unsuccessfully to get a copy of his book on the topic. If anybody knows that status of that effort or how to find his out-of-print books, please let us know. The general approach that FOray hopes to take eventually is a first-fit algorithm for the initial pass through a page-sequence, then a second optimization look that I hope to make do a Knuth-style evaluation. That may be sub-optimal, but it is my best guess right now about a reasonable way to proceed. Victor Mote
Re: Page breaking [was: Markers added to the wrong page]
On Feb 7, 2005, at 11:57 AM, Victor Mote wrote: The Web Maestro wrote: I don't know how the spec deals with this, but I doubt the spec cares which algorithm is used. That said, would it be a good idea to determine which algorithm to use based on something in userconfig.xml or something? If the Knuth system is more 'expensive' in terms of resources, we could make non-Knuth the default, and enable Knuth via a config file. Hi Clay: The good news is that this is an excellent idea. The bad news is that FOP's determination to not even tolerate more than one algorithm is what drove FOP and FOray apart. So you have stumbled in to a somewhat sensitive area, which might explain the absence of responses (so far). Thanks for the heads up, and also for the 'clue'... I never completely understood what happened, as it occurred before I was committed. ;-) I understand a bit more, thanks to this e-mail... I suspect there may be another 'side' to the story--there always is--and that there's may be other contributing factors... but this helps me understand much more than I understood before. Your explanation below also helps! I probably just need to ask for help when I don't understand something... but I've got enough going in my head anyway... FOP-DEV... Please don't take this as griping (it's not!), but I get lost in thee BP/BPD/IPD/AT speak now and again, which is one reason I tend to stay out of the fray on that stuff. If I get confused--and choose to attempt to contribute to a discussion--I'll speak up about anything that strikes my fancy. If there's something I don't get... I'll search Google first. BTW, since it is relevant here, thanks for your kind comments last week in another thread. You are ever the peacemaker, a very useful role. As long as the differences remain as great as they are, we *need* more than one FO implementation. In the meantime, I am much more useful and productive outside of FOP than I am in it, and, the FOP developers probably are also. To my perhaps still naive mind, layout consists of exactly two tasks: 1) finding line-breaks within logical blocks (nested blocks consisting of a new logical block), and 2) finding page-breaks between and within logical blocks. All other tasks can be conveniently handled within the two data structures, not being dependent on any strategy. The Knuth-style algorithm is by definition a total-fit algorithm. The scope of such an algorithm for the first task is the logical block itself. The scope of the second task is the page-sequence. So, to implement a Knuth-style algorithm (itself a good idea) for item 2, requires the ability to see an entire page-sequence in the AreaTree at one time. I am a little confused about whether FOP's design will even allow that to happen right now. I understood both Keiron's and Finn's design ideas (and Peter West's also) to be releasing pages from the AreaTree long before a page-sequence was even completely parsed. However, I may have misunderstood, especially Finn's, which I am especially fuzzy on. I read somewhere that Peter Karow was going to try to implement a Knuth-style algorithm for URW, and I have tried desperately and unsuccessfully to get a copy of his book on the topic. If anybody knows that status of that effort or how to find his out-of-print books, please let us know. Might these links help: http://www.google.com/search?hl=enq=Peter+Karow+bookspell=1 http://www.amazon.com/exec/obidos/ISBN%3D0387572236/102-3161708-3958541 http://www.amazon.com/exec/obidos/ISBN%3D0387565094/102-3161708-3958541 The general approach that FOray hopes to take eventually is a first-fit algorithm for the initial pass through a page-sequence, then a second optimization look that I hope to make do a Knuth-style evaluation. That may be sub-optimal, but it is my best guess right now about a reasonable way to proceed. Victor Mote Thanks again for the explanation! Cheers! Web Maestro Clay -- [EMAIL PROTECTED] - http://homepage.mac.com/webmaestro/ My religion is simple. My religion is kindness. - HH The 14th Dalai Lama of Tibet
RE: Page breaking [was: Markers added to the wrong page]
The Web Maestro wrote: e-mail... I suspect there may be another 'side' to the story--there always is--and that there's may be other contributing factors... but this helps me understand much more than I understood before. Your explanation below also Yes, there is another side, and I respect that. The point is that the differences are hard-wired at a pretty low-level (think silicon instead of Java). I am constitutionally unable to make myself work on a project that *can* be broken up into smaller pieces and isn't. Others appear to be constitutionally unable to allow that to happen. Most can probably live with it either way. OK, not every problem is solvable, and not every difference is reconcilable. We go on. You guys pull east, I'll pull west, and as long as we're pulling *different* wagons, we each have a shot at making some progress. Might these links help: I think those are the other books, but, to tell the truth, after looking at the links, I can't tell. I'll have to go dig out my old notes to see which book I was wanting. I'll check into it further. Thanks. Victor Mote