Re: page-breaking strategies and performance
Jeremias Maerki wrote: Hi Jeremias, I finally have Knuth's Digital Typography and let myself enlighten by his well-written words. In [1] Simon outlined different strategies for page-breaking, obviously closely following the different approaches defined by Knuth. At first glance, I'd say that best-fit is probably the obvious strategy to select, especially if TeX is happy with it. Obviously, it can't find the optimal solution like this but the additional overhead (memory and CPU power) of a look-ahead/total-fit strategy is simply too much and unnecessary for things like invoices and insurance policies which are surely some of the most popular use cases of XSL-FO. Here, speed is extremely important. People writing documentation (maybe using DocBook) or glossy stock reports have additional requirements and don't mind the longer processing time and additional memory requirements. This leads me to the question if we shouldn't actually implement two page-breaking strategies (in the end, not both right now). For a speed-optimized algorithm, we could even think about ignoring side-floats. We have dozens of customers using an XSL-FO solution and I can confirm invoices and insurance policies are a common use case for XSL-FO. A lot of companies have performance as a priority and we have no one using side floats or even thinking about using them, so optimizing for speed by ignoring side floats sounds like a good idea! But this is just my 2 cents and may conflict with other people's wishes. Obviously, in this model we would have to make sure that we use a common model for both strategies. For example, we still have to make sure that the line layout gets information on the available IPD on each line, but probably this will not be a big problem to include later. An enhanced/adjusted box/glue/penalty model sounds like a good idea to me especially since Knuth hints at that in his book, too. There's also a question if part of the infrastructure from line breaking can be reused for page breaking, but I guess rather not. Probably best to re-create an algorithm from scratch for page breaking but line breaking can be reviewed for ideas. As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? If we are going to branch the code for this then we need to make sure we have a plan to merge the branch back once we are confident in the new page breaking algorithm. This plan (which should be agreed before branching takes place) should include an acceptance procedure, e.g. will a single -1 be able to prevent the code being merged back? We dont want to end up with another alt-design, which eventually moved to source forge!!! Chris
Re: page-breaking strategies and performance
--- Chris Bowditch [EMAIL PROTECTED] wrote: As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? If we are going to branch the code for this then we need to make sure we have a plan to merge the branch back once we are confident in the new page breaking algorithm. This plan (which should be agreed before branching takes place) should include an acceptance procedure, e.g. will a single -1 be able to prevent the code being merged back? We dont want to end up with another alt-design, which eventually moved to source forge!!! Chris Either way is fine with me, but Chris brings up a very valid point. If you can tolerate and keep up with my minor code housekeeping from time to time in some of the layout managers (currently mostly PSLM), feel free to work from HEAD directly instead if you wish. Glen
Re: page-breaking strategies and performance
I'd rather not work on HEAD directly because after creating the basics for the new mechanism the whole thing will probably not work for some time (probably 2-4 weeks). But I'd like to be able to check in early so people can review. I expect that the life time of the branch will not exceed 8 weeks. So there's almost no chance that alt-design is repeated, especially since the basic LM infrastructure will not be altered big time and it looks like we are all going in the same direction for the new page-breaking. It's clear that it has to be done and it seems to be moveing in the direction of a derived Knuth approach. It's much like the migration to the Knuth line breaking and it's mostly the block-level LMs that will be affected. People can continue to work on HEAD during that time as long as nothing serious is altered in the block-level LMs which would make merging difficult. Before I can kick off we need to agree to the general approach for the algorithm and clear a few details so we are reasonably sure that it'll work. Once we have that the plan for the branch should not be a big deal if we take the above into account. On 02.03.2005 13:16:42 Glen Mazza wrote: --- Chris Bowditch [EMAIL PROTECTED] wrote: As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? If we are going to branch the code for this then we need to make sure we have a plan to merge the branch back once we are confident in the new page breaking algorithm. This plan (which should be agreed before branching takes place) should include an acceptance procedure, e.g. will a single -1 be able to prevent the code being merged back? We dont want to end up with another alt-design, which eventually moved to source forge!!! Chris Either way is fine with me, but Chris brings up a very valid point. If you can tolerate and keep up with my minor code housekeeping from time to time in some of the layout managers (currently mostly PSLM), feel free to work from HEAD directly instead if you wish. Glen Jeremias Maerki
Re: page-breaking strategies and performance
Just a sanity check here, the XSL specification seems to suggest always the first-fit strategy for page breaking *except* where keeps are explicitly specified. Am I correct here? And, if so, is what you're planning going to result in an algorithm that will help us do this? Thanks, Glen --- Jeremias Maerki [EMAIL PROTECTED] wrote: I'd rather not work on HEAD directly because after creating the basics for the new mechanism the whole thing will probably not work for some time (probably 2-4 weeks). But I'd like to be able to check in early so people can review. I expect that the life time of the branch will not exceed 8 weeks. So there's almost no chance that alt-design is repeated, especially since the basic LM infrastructure will not be altered big time and it looks like we are all going in the same direction for the new page-breaking. It's clear that it has to be done and it seems to be moveing in the direction of a derived Knuth approach. It's much like the migration to the Knuth line breaking and it's mostly the block-level LMs that will be affected. People can continue to work on HEAD during that time as long as nothing serious is altered in the block-level LMs which would make merging difficult. Before I can kick off we need to agree to the general approach for the algorithm and clear a few details so we are reasonably sure that it'll work. Once we have that the plan for the branch should not be a big deal if we take the above into account. On 02.03.2005 13:16:42 Glen Mazza wrote: --- Chris Bowditch [EMAIL PROTECTED] wrote: As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? If we are going to branch the code for this then we need to make sure we have a plan to merge the branch back once we are confident in the new page breaking algorithm. This plan (which should be agreed before branching takes place) should include an acceptance procedure, e.g. will a single -1 be able to prevent the code being merged back? We dont want to end up with another alt-design, which eventually moved to source forge!!! Chris Either way is fine with me, but Chris brings up a very valid point. If you can tolerate and keep up with my minor code housekeeping from time to time in some of the layout managers (currently mostly PSLM), feel free to work from HEAD directly instead if you wish. Glen Jeremias Maerki
Re: page-breaking strategies and performance
Where did you find such a suggestion? I'd be interested to know if there's a hint in this direction in the spec. I thought it was up to the implementation to decide the strategy. I think the way we're now taking in our discussion suggests that we're not going to do a first-fit strategy at all. If we're really going down the two-strategy path we'll probably end up with a best-fit strategy and a total-fit or best-fit plus look-ahead. (See Simon's list [1]) But that's something we still need to figure out together. [1] http://wiki.apache.org/xmlgraphics-fop/PageLayout On 02.03.2005 14:48:17 Glen Mazza wrote: Just a sanity check here, the XSL specification seems to suggest always the first-fit strategy for page breaking *except* where keeps are explicitly specified. Am I correct here? And, if so, is what you're planning going to result in an algorithm that will help us do this? Jeremias Maerki
Re: page-breaking strategies and performance
I'm unsure here. My interpretation comes from two places: 1.) Section 4.8, the last paragraph of [1]: The area tree is constrained to satisfy all break conditions imposed. ***Each keep condition must also be satisfied***, except when this would cause a break condition or a stronger keep condition to fail to be satisfied. i.e., keep conditions need to be satisfied. 2.) The definitions of the three keep-[] properties [2] each have a initial value of auto, meaning There are no keep-[] conditions imposed by this property. So by default, if the user does not explicitly specify keep properties, e.g., keep-together.within-page, no text, pictures, etc. are to be kept together on the same page, if they wouldn't already be so due to free-flowing (i.e., first-fit) text. Everything would become free-flowing in order to obey the stylesheet writer's specifications. Just my $0.02. Thanks, Glen [1] http://www.w3.org/TR/2001/REC-xsl-20011015/slice4.html#keepbreak [2] http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#keep-together --- Jeremias Maerki [EMAIL PROTECTED] wrote: Where did you find such a suggestion? I'd be interested to know if there's a hint in this direction in the spec. I thought it was up to the implementation to decide the strategy. I think the way we're now taking in our discussion suggests that we're not going to do a first-fit strategy at all. If we're really going down the two-strategy path we'll probably end up with a best-fit strategy and a total-fit or best-fit plus look-ahead. (See Simon's list [1]) But that's something we still need to figure out together. If we ever have multiple page-breaking options, it can be a user-defined configuration switch. No problem there. Glen [1] http://wiki.apache.org/xmlgraphics-fop/PageLayout On 02.03.2005 14:48:17 Glen Mazza wrote: Just a sanity check here, the XSL specification seems to suggest always the first-fit strategy for page breaking *except* where keeps are explicitly specified. Am I correct here? And, if so, is what you're planning going to result in an algorithm that will help us do this? Jeremias Maerki
Re: page-breaking strategies and performance
Thanks. I think this has only to do with the rules to handle keeps and breaks and how to resolve conflicts. I don't think, however, that these parts create a restriction which tells us what page-breaking strategy to pursue. We could probably run with a first-fit strategy and still fulfill the rules below if we accept a lot of backtracking. But as Simon suggested, this seems to be a poor approach. Keeps and breaks are only part of what a page breaking algorithm has to deal with. See [3]. [3] http://wiki.apache.org/xmlgraphics-fop/PageLayout/InfluencingFeatures On 02.03.2005 16:44:17 Glen Mazza wrote: I'm unsure here. My interpretation comes from two places: 1.) Section 4.8, the last paragraph of [1]: The area tree is constrained to satisfy all break conditions imposed. ***Each keep condition must also be satisfied***, except when this would cause a break condition or a stronger keep condition to fail to be satisfied. i.e., keep conditions need to be satisfied. 2.) The definitions of the three keep-[] properties [2] each have a initial value of auto, meaning There are no keep-[] conditions imposed by this property. So by default, if the user does not explicitly specify keep properties, e.g., keep-together.within-page, no text, pictures, etc. are to be kept together on the same page, if they wouldn't already be so due to free-flowing (i.e., first-fit) text. Everything would become free-flowing in order to obey the stylesheet writer's specifications. Just my $0.02. Thanks, Glen [1] http://www.w3.org/TR/2001/REC-xsl-20011015/slice4.html#keepbreak [2] http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#keep-together --- Jeremias Maerki [EMAIL PROTECTED] wrote: Where did you find such a suggestion? I'd be interested to know if there's a hint in this direction in the spec. I thought it was up to the implementation to decide the strategy. I think the way we're now taking in our discussion suggests that we're not going to do a first-fit strategy at all. If we're really going down the two-strategy path we'll probably end up with a best-fit strategy and a total-fit or best-fit plus look-ahead. (See Simon's list [1]) But that's something we still need to figure out together. If we ever have multiple page-breaking options, it can be a user-defined configuration switch. No problem there. Glen [1] http://wiki.apache.org/xmlgraphics-fop/PageLayout On 02.03.2005 14:48:17 Glen Mazza wrote: Just a sanity check here, the XSL specification seems to suggest always the first-fit strategy for page breaking *except* where keeps are explicitly specified. Am I correct here? And, if so, is what you're planning going to result in an algorithm that will help us do this? Jeremias Maerki Jeremias Maerki
Re: page-breaking strategies and performance
Yes, I'm not in Simon's league here--I know very little about TeX--so I'll defer to you two on this issue. Just try to make sure that the final algorithm will help us support the keep-* properties. Thanks, Glen --- Jeremias Maerki [EMAIL PROTECTED] wrote: Thanks. I think this has only to do with the rules to handle keeps and breaks and how to resolve conflicts. I don't think, however, that these parts create a restriction which tells us what page-breaking strategy to pursue. We could probably run with a first-fit strategy and still fulfill the rules below if we accept a lot of backtracking. But as Simon suggested, this seems to be a poor approach. Keeps and breaks are only part of what a page breaking algorithm has to deal with. See [3]. [3] http://wiki.apache.org/xmlgraphics-fop/PageLayout/InfluencingFeatures On 02.03.2005 16:44:17 Glen Mazza wrote: I'm unsure here. My interpretation comes from two places: 1.) Section 4.8, the last paragraph of [1]: The area tree is constrained to satisfy all break conditions imposed. ***Each keep condition must also be satisfied***, except when this would cause a break condition or a stronger keep condition to fail to be satisfied. i.e., keep conditions need to be satisfied. 2.) The definitions of the three keep-[] properties [2] each have a initial value of auto, meaning There are no keep-[] conditions imposed by this property. So by default, if the user does not explicitly specify keep properties, e.g., keep-together.within-page, no text, pictures, etc. are to be kept together on the same page, if they wouldn't already be so due to free-flowing (i.e., first-fit) text. Everything would become free-flowing in order to obey the stylesheet writer's specifications. Just my $0.02. Thanks, Glen [1] http://www.w3.org/TR/2001/REC-xsl-20011015/slice4.html#keepbreak [2] http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#keep-together --- Jeremias Maerki [EMAIL PROTECTED] wrote: Where did you find such a suggestion? I'd be interested to know if there's a hint in this direction in the spec. I thought it was up to the implementation to decide the strategy. I think the way we're now taking in our discussion suggests that we're not going to do a first-fit strategy at all. If we're really going down the two-strategy path we'll probably end up with a best-fit strategy and a total-fit or best-fit plus look-ahead. (See Simon's list [1]) But that's something we still need to figure out together. If we ever have multiple page-breaking options, it can be a user-defined configuration switch. No problem there. Glen [1] http://wiki.apache.org/xmlgraphics-fop/PageLayout On 02.03.2005 14:48:17 Glen Mazza wrote: Just a sanity check here, the XSL specification seems to suggest always the first-fit strategy for page breaking *except* where keeps are explicitly specified. Am I correct here? And, if so, is what you're planning going to result in an algorithm that will help us do this? Jeremias Maerki Jeremias Maerki
Re: page-breaking strategies and performance
On 02.03.2005 17:05:55 Glen Mazza wrote: Yes, I'm not in Simon's league here--I know very little about TeX--so I'll defer to you two on this issue. I'm also still struggling. :-) Just try to make sure that the final algorithm will help us support the keep-* properties. Yes, the algorithm MUST be able to handle full keep support (among other things). That's part of why we need a new approach. The present one doesn't quite fit the picture, yet. Thankfully, with the new design we don't have to again rewrite the whole FOP. The present approach was very good to point us in the right direction and most of the effort already invested is not lost. We just have to improve a specific part. Jeremias Maerki
page-breaking strategies and performance
I finally have Knuth's Digital Typography and let myself enlighten by his well-written words. In [1] Simon outlined different strategies for page-breaking, obviously closely following the different approaches defined by Knuth. At first glance, I'd say that best-fit is probably the obvious strategy to select, especially if TeX is happy with it. Obviously, it can't find the optimal solution like this but the additional overhead (memory and CPU power) of a look-ahead/total-fit strategy is simply too much and unnecessary for things like invoices and insurance policies which are surely some of the most popular use cases of XSL-FO. Here, speed is extremely important. People writing documentation (maybe using DocBook) or glossy stock reports have additional requirements and don't mind the longer processing time and additional memory requirements. This leads me to the question if we shouldn't actually implement two page-breaking strategies (in the end, not both right now). For a speed-optimized algorithm, we could even think about ignoring side-floats. Obviously, in this model we would have to make sure that we use a common model for both strategies. For example, we still have to make sure that the line layout gets information on the available IPD on each line, but probably this will not be a big problem to include later. An enhanced/adjusted box/glue/penalty model sounds like a good idea to me especially since Knuth hints at that in his book, too. There's also a question if part of the infrastructure from line breaking can be reused for page breaking, but I guess rather not. As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? [1] http://wiki.apache.org/xmlgraphics-fop/PageLayout Jeremias Maerki
RE: page-breaking strategies and performance
Jeremias Maerki wrote: processing time and additional memory requirements. This leads me to the question if we shouldn't actually implement two page-breaking strategies (in the end, not both right now). For a speed-optimized algorithm, we could even think about ignoring side-floats. Obviously, in this model we would have to make sure that we use a common model for both strategies. For example, we still have to make sure that the line layout gets information on the available IPD on each line, but probably this will not be a big problem to include later. This is an excellent idea. It has from time to time gone under the moniker LayoutStrategy or pluggable layout. To do it without duplicating everything requires that the other pieces of the system be modularized, the concerns separated so that they can be reused. The upside is tremendous and the cost pays for itself in developer productivity. Victor Mote
Re: page-breaking strategies and performance
On Tue, Mar 01, 2005 at 03:02:38PM +0100, Jeremias Maerki wrote: As for the plan to implement a new page-breaking mechanism: I've got to do it now. :-) I'm sorry if this may put some pressure on some of you. I'm also not sure if I'm fit already to tackle it, but I've got to do it anyway. Since I don't want to work with a series of patches like you guys did earlier, I'd like to create a branch to do that on as soon as we've agreed on a strategy. Any objections to that? That is a good idea. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: page-breaking strategies and performance
On Tue, Mar 01, 2005 at 07:52:27AM -0700, Victor Mote wrote: Jeremias Maerki wrote: processing time and additional memory requirements. This leads me to the question if we shouldn't actually implement two page-breaking strategies (in the end, not both right now). For a speed-optimized algorithm, we could even think about ignoring side-floats. Obviously, in this model we would have to make sure that we use a common model for both strategies. For example, we still have to make sure that the line layout gets information on the available IPD on each line, but probably this will not be a big problem to include later. This is an excellent idea. It has from time to time gone under the moniker LayoutStrategy or pluggable layout. To do it without duplicating everything requires that the other pieces of the system be modularized, the concerns separated so that they can be reused. The upside is tremendous and the cost pays for itself in developer productivity. The idea of having two page breaking strategies is OK. But it is also a goal that is yet far over the horizon. I hope this is smaller than having pluggable layout. We should try to express the layout constraints in a simple language, which can be used by the algorithms of both strategies. Knuth's model is an effort to achieve that, and a PageLayoutManager which receives the Knuth elements and invokes the appropriate algorithm goes with it. Such a setup should not only enable multiple page breaking strategies, but also help us implement a simple strategy to start with, and gradually evolve it to a higher level of sophistication. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: page-breaking strategies and performance
On 01.03.2005 22:25:12 Simon Pepping wrote: On Tue, Mar 01, 2005 at 07:52:27AM -0700, Victor Mote wrote: Jeremias Maerki wrote: processing time and additional memory requirements. This leads me to the question if we shouldn't actually implement two page-breaking strategies (in the end, not both right now). For a speed-optimized algorithm, we could even think about ignoring side-floats. Obviously, in this model we would have to make sure that we use a common model for both strategies. For example, we still have to make sure that the line layout gets information on the available IPD on each line, but probably this will not be a big problem to include later. This is an excellent idea. It has from time to time gone under the moniker LayoutStrategy or pluggable layout. To do it without duplicating everything requires that the other pieces of the system be modularized, the concerns separated so that they can be reused. The upside is tremendous and the cost pays for itself in developer productivity. The idea of having two page breaking strategies is OK. But it is also a goal that is yet far over the horizon. Right. What I'd like to achieve is having a usable layout engine with the minimum of effort for most use cases but without blocking our way in terms of full compliance like what happened with the old code base. I also don't want to invest to much time in an infrastructure to support pluggable strategies, only that we keep it in mind while we build the first one. I hope this is smaller than having pluggable layout. My hope, too. The critical part is to determine the model that helps us express all the elements of the XSL-FO standard. We should try to express the layout constraints in a simple language, which can be used by the algorithms of both strategies. Knuth's model is an effort to achieve that, and a PageLayoutManager which receives the Knuth elements and invokes the appropriate algorithm goes with it. That's what I'm currently trying to figure out. I guess we'll need to sketch all the different layout elements that we need to support like you started. Such a setup should not only enable multiple page breaking strategies, but also help us implement a simple strategy to start with, and gradually evolve it to a higher level of sophistication. That's the idea. Jeremias Maerki