### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

An alternative link to the documents Renaud referred to: http://www.le-hacker.org/hacks/pagination/ There are a few other quite interesting documents on layout questions, particularly something about tables which might come in handy later. On 04.03.2005 02:14:56 Renaud Richardet wrote: Vincent Hennebert wrote: By looking for this reference, I found the following article: www.pi6.fernuni-hagen.de/publ/tr234.pdf It's entitled 'On the Pagination of Complex Documents' (actually it's also referencing Plass). There's another article, where the top level of the algorithm is presented (page 8). http://www.pi6.fernuni-hagen.de/publ/tr205.pdf Those 2 articles are summaries of a book. The link to command the book can be found at the bottom of http://web.informatik.uni-bonn.de/I/research/Pagination/ Renaud Jeremias Maerki

### RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Jeremias Maerki wrote: While looking for material on page breaking I found several references to this document: http://wwwlib.umi.com/dissertations/fullcit/8124134 Does anyone know if it's worth ordering and waiting for it? The unfortunate thing is that they don't seem to have a PDF version that I could download immediately for a reasonable fee. Wow. This looks like it is very valuable. I have ordered it for my own use, and I'll be glad to give you a book review when it arrives to help you decide whether it is worthwhile for you or not. I am especially interested in the summary's comment: For certain simple badness functions, the pagination problem is NP-complete; Dealing with that challenge is the likely tricky spot in all of this. My intuition has always been that the page-breaking problem is much more complicated than the line-breaking one, partly because lines must be laid out to even think about page-breaking (and line lengths can change as they move around), partly because you are effectively working with changes in two dimensions instead of one, and partly because there seem to me to be a lot more variables in the problem. I am hoping to find some insight into the detection and workarounds for the NP-complete situations. Note that Stanford is Knuth's school, the date year is the same as that of Chapter 3 of Knuth's Digital Typography, and that the author is the co-author of that article. It may be possible to infer the same information from looking at the TeX source code. Also, another source of similar information would be Volume I of Knuth's Computers and Typesetting, aka The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15 is entitled How TeX Makes Lines into Pages. You guys are way ahead of me in terms of thinking about how to implement this stuff. As you know, my approach has been to leave this stuff for last, preferring instead to solve the outer-layer problems first, and provide for multiple implementations that can be improved in parallel. However, I have a great interest in your efforts, and will be glad to help any way that I can. And, FWIW, I think you are on the right general track, in this regard at least. Victor Mote

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

On 03.03.2005 16:19:24 Victor Mote wrote: Jeremias Maerki wrote: While looking for material on page breaking I found several references to this document: http://wwwlib.umi.com/dissertations/fullcit/8124134 Does anyone know if it's worth ordering and waiting for it? The unfortunate thing is that they don't seem to have a PDF version that I could download immediately for a reasonable fee. Wow. This looks like it is very valuable. I have ordered it for my own use, and I'll be glad to give you a book review when it arrives to help you decide whether it is worthwhile for you or not. I will probably order it anyway because I need it yesterday. :-) That's why I wanted the PDF version which they don't seem to have. Sigh. I am especially interested in the summary's comment: For certain simple badness functions, the pagination problem is NP-complete; Dealing with that challenge is the likely tricky spot in all of this. My intuition has always been that the page-breaking problem is much more complicated than the line-breaking one, partly because lines must be laid out to even think about page-breaking (and line lengths can change as they move around), partly because you are effectively working with changes in two dimensions instead of one, and partly because there seem to me to be a lot more variables in the problem. I am hoping to find some insight into the detection and workarounds for the NP-complete situations. I've found other references to discussion about communication between line and page breaking algorithms. But that stuff was mostly overview-style and written in a language I don't understand well: universitary language. That's probably the first time I regret stopping university after one semester because I hate mathematics. :-) Note that Stanford is Knuth's school, the date year is the same as that of Chapter 3 of Knuth's Digital Typography, and that the author is the co-author of that article. It may be possible to infer the same information from looking at the TeX source code. Also, another source of similar information would be Volume I of Knuth's Computers and Typesetting, aka The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15 is entitled How TeX Makes Lines into Pages. I haven't dared look into the TeX source code, yet, but I've read most of the chapter you mention. Didn't really help because there are many many TeX-specific things in there. You guys are way ahead of me in terms of thinking about how to implement this stuff. As you know, my approach has been to leave this stuff for last, preferring instead to solve the outer-layer problems first, and provide for multiple implementations that can be improved in parallel. However, I have a great interest in your efforts, and will be glad to help any way that I can. And, FWIW, I think you are on the right general track, in this regard at least. I very much hope so. But it becomes more and more apparent that this will be the greatest challenge in my programmer's life. Wow indeed. Jeremias Maerki

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Happy to see how you have reprioritized your efforts over the past two months [1], and much, much for the better. Glen --- Jeremias Maerki [EMAIL PROTECTED] wrote: I very much hope so. But it becomes more and more apparent that this will be the greatest challenge in my programmer's life. Wow indeed. Jeremias Maerki [1] http://marc.theaimsgroup.com/?l=fop-devm=110495579414655w=2

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Actually, I haven't. The RTF edition idea (which I haven't discarded, yet, just postponed due to time shortage) is still very high on my private priority list. As you can guess there's currently another one which comes first. On 03.03.2005 18:31:12 Glen Mazza wrote: Happy to see how you have reprioritized your efforts over the past two months [1], and much, much for the better. Glen --- Jeremias Maerki [EMAIL PROTECTED] wrote: I very much hope so. But it becomes more and more apparent that this will be the greatest challenge in my programmer's life. Wow indeed. Jeremias Maerki [1] http://marc.theaimsgroup.com/?l=fop-devm=110495579414655w=2 Jeremias Maerki

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

On Thu, Mar 03, 2005 at 08:19:24AM -0700, Victor Mote wrote: Jeremias Maerki wrote: While looking for material on page breaking I found several references to this document: http://wwwlib.umi.com/dissertations/fullcit/8124134 Does anyone know if it's worth ordering and waiting for it? The unfortunate thing is that they don't seem to have a PDF version that I could download immediately for a reasonable fee. Wow. This looks like it is very valuable. I have ordered it for my own use, and I'll be glad to give you a book review when it arrives to help you decide whether it is worthwhile for you or not. I do not know it. It sounds like I should buy it as well. Note that Stanford is Knuth's school, the date year is the same as that of Chapter 3 of Knuth's Digital Typography, and that the author is the co-author of that article. It may be possible to infer the same information from looking at the TeX source code. Also, another source of similar information would be Volume I of Knuth's Computers and Typesetting, aka The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15 is entitled How TeX Makes Lines into Pages. Note that The TeXbook is TeX's user guide. Yes, Knuth's users are quite advanced. It was my first book in the direction of computers, and one of the most inspirational I have read. The TeX program is described in 'TeX The Program'. That text is weaved into the program code according to Knuth's literate programming system. It can be freely extracted from the program code. A TeX distribution like TeXLive contains the tools to do this. I intend to do so soon. If you want to do it yourself, TeXLive is available from the TUG website, www.tug.org. The TeX source code itself is available from the CTAN repository, but I fear that you have to do some work to set up all the tools. It is up to yourself to decide whether knowing TeX's implementation is useful. It is a best-fit algorithm. There is no look-ahead. For example, TeX is not able to balance two facing pages (or two columns on a page, which for TeX is the same). I guess that a dissertation like that cited above contains much more information than implemented in TeX. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl

### RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Simon Pepping wrote: Note that The TeXbook is TeX's user guide. Yes, Knuth's users are quite advanced. It was my first book in the direction of computers, and one of the most inspirational I have read. The TeX program is described in 'TeX The Program'. That text Oops. You're right. That is volume 2 from the same Computers and Typesetting series. set up all the tools. It is up to yourself to decide whether knowing TeX's implementation is useful. It is a best-fit algorithm. There is no look-ahead. For example, TeX is not able to balance two facing pages (or two columns on a page, which for TeX is the same). I guess that a dissertation like that cited above contains much more information than implemented in TeX. I'm not sure. The general TeX page-breaking algorithm is discussed in the paragraphs before Appendix A of Chapter 3 of Digital Typography. The general box/glue/penalty model is used, but only the current page is considered. So I think the difference between best-fit and total-fit (as described here anyway) is the amount of look-ahead itself, not so much the algorithm. This is why I thought Finn's (IIRC) idea of a variable look-ahead makes sense. A look-ahead of zero pages is a best-fit, a look-ahead of all pages is a total-fit. But the algorithm is the same. Anyway, I agree that the paper is probably the best source, but wanted to give Jeremias some options. Victor Mote

### RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Victor Mote wrote: Oops. You're right. That is volume 2 from the same Computers and Typesetting series. Er, it is actually volume B. Victor Mote

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Hi Fop Team, Simon Pepping a écrit : On Thu, Mar 03, 2005 at 08:19:24AM -0700, Victor Mote wrote: Jeremias Maerki wrote: While looking for material on page breaking I found several references to this document: http://wwwlib.umi.com/dissertations/fullcit/8124134 Does anyone know if it's worth ordering and waiting for it? By looking for this reference, I found the following article: www.pi6.fernuni-hagen.de/publ/tr234.pdf It's entitled 'On the Pagination of Complex Documents' (actually it's also referencing Plass). I've read parts of this article and it seems interesting. It's well written, quite easy to understand and provides a better algorithm than TeX's one. And it's also more recent. It doesn't agree with Plass about the NP-hard problem of page layout. Indeed that depends on the formula we are using to estimate the badness of a layout. It proposes another formula which seems more reasonable and better corresponds to a reader's expectations. Perhaps it could provide a good basis? I don't have Knuth's 'Digital Typography' (I'm considering purchasing it). It may be worthwhile to compare this article with what is in the book. The TeX program is described in 'TeX The Program'. That text is weaved into the program code according to Knuth's literate programming system. It can be freely extracted from the program code. I've already done it, and AFAICT it's not easy to find the way in all the TeX stuff (actually nothing is easy with TeX ;-)). I can send you the pdf file if you want, feel free to ask me. Be warned, though: it's a 535 pages (!) document that entirely describes the TeX program. I've started looking into it, and, well, it's rather cryptic. It's very close to the implementation, it seems to be difficult to get a general idea of what it's doing. But I'll investigate a bit more. And, as Simon wrote, TeX is excellent in line-breaking but not as good in page-breaking. It was implemented in the early 80's when memory was expensive. However, it was written with typographic quality in mind, and that's why it may be a good idea to try getting some hints from it. Vincent

### Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

Vincent Hennebert wrote: By looking for this reference, I found the following article: www.pi6.fernuni-hagen.de/publ/tr234.pdf It's entitled 'On the Pagination of Complex Documents' (actually it's also referencing Plass). There's another article, where the top level of the algorithm is presented (page 8). http://www.pi6.fernuni-hagen.de/publ/tr205.pdf Those 2 articles are summaries of a book. The link to command the book can be found at the bottom of http://web.informatik.uni-bonn.de/I/research/Pagination/ Renaud