Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Jeremias Maerki
While looking for material on page breaking I found several references
to this document:

http://wwwlib.umi.com/dissertations/fullcit/8124134

Does anyone know if it's worth ordering and waiting for it? The
unfortunate thing is that they don't seem to have a PDF version that I
could download immediately for a reasonable fee.

Thanks,
Jeremias Maerki



Re: [XML Graphics - FOP Wiki] Updated: PageLayout

2005-03-03 Thread Jeremias Maerki
I've tried to think this case through. Simon's suggestion for a special
penalty is intriguing. It handles the case for table borders where we
don't simply have an x or 0 situation (like penalty and glue provide),
but we have a x or y situation.

On the other side Luca is probably right that we should try to handle as
much as possible with the existing elements. However, Luca's example
does not fully resolve in my brain. The penalty, for example, must not
be infinite or it will not be eligible for break possibility. A legal
break is defined by Knuth as a number b such that either
(i) xb is a penalty item with pb  infinity, or (ii) xb is a glue item
and xb-1 is a box item. As far as I can see Luca's example doesn't
contain any legal break, but I guess that was simply an oversight.

The big problem I still have with both your examples is that the table
header is very special in terms of the standard Knuth model. This model
doesn't allow for conditional items at the beginning of a line. What
Luca did in his example looks to me like forcing the model to do
something it wasn't designed for. I'm a bit sceptical that the code will
be able to identify such special conditions reliably.

In this case I think we would have to introduce a special element that
gets active if such a penalty was triggered on the last line/page (a
carryover-penalty or something like that).

In the end I think that all the elements after a chosen break may be
invalid anyway if the available IPD changes (for example when the n+1
page is landscape while the page n was portrait). In this case all
non-finalized elements have to be recalculated due to different break
decisions in the line layout managers and different values coming from
page-number(-citation) elements. [1]

I wonder how other systems cope with this. Information on this seems to
be very rare, at least when googling with the Knuth model in mind.

Head smoking, digging deeper...

[1] http://wiki.apache.org/xmlgraphics-fop/PageLayout/KnuthElementEvaluation

On 28.02.2005 10:15:39 Luca Furini wrote:
 
 Simon Pepping wrote:
 
 +=== Space specifiers ===
 +
 +When the space specifiers resolve to zero around a page break, we are
 +in the same situation as that of a word space in line breaking. It is
 +represented by the sequence `box - glue - box`.
 
 I add just a few thoughts about this subject.
 
 If there cannot be a break between the two block (the first has
 keep-with-next || the second has keep-with-previous || their block father
 has keep-together), the representation can be box - infinite penalty -
 glue - box.
 
 +=== Possible page break between content elements ===
 +
 +Here the most general situation is that when the content is different
 +with and without page break:
 + * content Cn when there is no page break,
 + * content Ca at the end of the page before the page break,
 + * content Cb at the start of the page after the page break.
 +
 +An example of this situation is a page break between table rows:
 +
 +{{{
 +no page break:page break:
 +
 +- -
 +  row 1 row 1
 +- -
 + border n  border a
 +- -
 +  row 2footer
 +- -
 +  page break
 +  -
 +   header
 +  -
 +   border b
 +  -
 +row 2
 +  -
 +}}}
 +
 +This situation cannot be dealt with using Knuth's box/glue/penalty
 +model.
 
 Maybe there is no need to create new kinds of elements (not that it's
 forbidden :-) , only the fewer they are, the simpler the algorithm is).
 
 Header and footer, with their borders, are duplicated around each break if
 table-omit-*-at-break is false; so, at each break the total height
 increases by (border a + footer + header + border b) and decreases by
 border n.
 
 Here is a different representation which uses normal penalties; there are
 two rows whose bpd is h1 and h2, header and footer with bpd hH and hF,
 border before the footer , border after the header and border between rows
 hA, hB and hN.
 
 box(h1) - penalty(inf, hH + hA + hB + hF) - glue(hN) - box(h2) - box(hB +
 hF) - box(hH + hA)
 
 If there is no break, the overall bpd is (hH + hA + h1 + hN + h2 + hB +
 hF), otherwise the first piece has bpd (hH + hA + h1 + hB + hF) and the
 second one (hH + hA + h2 + hB + hF), and the border between the rows is
 ignored.
 
 The elements representing the header and its border are moved at the end
 of the sequence, but I don't think this is could be a real problem: the
 TableLayoutManager would place it at its right place when adding areas.
 
 Regards
 Luca
 



Jeremias Maerki



RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Victor Mote
Jeremias Maerki wrote:

 While looking for material on page breaking I found several 
 references to this document:
 
 http://wwwlib.umi.com/dissertations/fullcit/8124134
 
 Does anyone know if it's worth ordering and waiting for it? 
 The unfortunate thing is that they don't seem to have a PDF 
 version that I could download immediately for a reasonable fee.

Wow. This looks like it is very valuable. I have ordered it for my own use,
and I'll be glad to give you a book review when it arrives to help you
decide whether it is worthwhile for you or not.

I am especially interested in the summary's comment: For certain simple
badness functions, the pagination problem is NP-complete; Dealing with that
challenge is the likely tricky spot in all of this. My intuition has always
been that the page-breaking problem is much more complicated than the
line-breaking one, partly because lines must be laid out to even think about
page-breaking (and line lengths can change as they move around), partly
because you are effectively working with changes in two dimensions instead
of one, and partly because there seem to me to be a lot more variables in
the problem. I am hoping to find some insight into the detection and
workarounds for the NP-complete situations.

Note that Stanford is Knuth's school, the date year is the same as that of
Chapter 3 of Knuth's Digital Typography, and that the author is the
co-author of that article. It may be possible to infer the same information
from looking at the TeX source code. Also, another source of similar
information would be Volume I of Knuth's Computers and Typesetting, aka
The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15
is entitled How TeX Makes Lines into Pages.

You guys are way ahead of me in terms of thinking about how to implement
this stuff. As you know, my approach has been to leave this stuff for last,
preferring instead to solve the outer-layer problems first, and provide for
multiple implementations that can be improved in parallel. However, I have a
great interest in your efforts, and will be glad to help any way that I can.
And, FWIW, I think you are on the right general track, in this regard at
least.

Victor Mote



Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Jeremias Maerki

On 03.03.2005 16:19:24 Victor Mote wrote:
 Jeremias Maerki wrote:
 
  While looking for material on page breaking I found several 
  references to this document:
  
  http://wwwlib.umi.com/dissertations/fullcit/8124134
  
  Does anyone know if it's worth ordering and waiting for it? 
  The unfortunate thing is that they don't seem to have a PDF 
  version that I could download immediately for a reasonable fee.
 
 Wow. This looks like it is very valuable. I have ordered it for my own use,
 and I'll be glad to give you a book review when it arrives to help you
 decide whether it is worthwhile for you or not.

I will probably order it anyway because I need it yesterday. :-) That's
why I wanted the PDF version which they don't seem to have. Sigh.

 I am especially interested in the summary's comment: For certain simple
 badness functions, the pagination problem is NP-complete; Dealing with that
 challenge is the likely tricky spot in all of this. My intuition has always
 been that the page-breaking problem is much more complicated than the
 line-breaking one, partly because lines must be laid out to even think about
 page-breaking (and line lengths can change as they move around), partly
 because you are effectively working with changes in two dimensions instead
 of one, and partly because there seem to me to be a lot more variables in
 the problem. I am hoping to find some insight into the detection and
 workarounds for the NP-complete situations.

I've found other references to discussion about communication between
line and page breaking algorithms. But that stuff was mostly
overview-style and written in a language I don't understand well:
universitary language. That's probably the first time I regret stopping
university after one semester because I hate mathematics. :-)

 Note that Stanford is Knuth's school, the date year is the same as that of
 Chapter 3 of Knuth's Digital Typography, and that the author is the
 co-author of that article. It may be possible to infer the same information
 from looking at the TeX source code. Also, another source of similar
 information would be Volume I of Knuth's Computers and Typesetting, aka
 The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15
 is entitled How TeX Makes Lines into Pages.

I haven't dared look into the TeX source code, yet, but I've read most
of the chapter you mention. Didn't really help because there are many
many TeX-specific things in there.

 You guys are way ahead of me in terms of thinking about how to implement
 this stuff. As you know, my approach has been to leave this stuff for last,
 preferring instead to solve the outer-layer problems first, and provide for
 multiple implementations that can be improved in parallel. However, I have a
 great interest in your efforts, and will be glad to help any way that I can.
 And, FWIW, I think you are on the right general track, in this regard at
 least.

I very much hope so. But it becomes more and more apparent that this
will be the greatest challenge in my programmer's life. Wow indeed.


Jeremias Maerki



Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Glen Mazza
Happy to see how you have reprioritized your efforts
over the past two months [1], and much, much for the
better.

Glen

--- Jeremias Maerki [EMAIL PROTECTED] wrote:
 
 I very much hope so. But it becomes more and more
 apparent that this
 will be the greatest challenge in my programmer's
 life. Wow indeed.
 
 
 Jeremias Maerki
 
 

[1] http://marc.theaimsgroup.com/?l=fop-devm=110495579414655w=2


Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Jeremias Maerki
Actually, I haven't. The RTF edition idea (which I haven't discarded,
yet, just postponed due to time shortage) is still very high on my
private priority list. As you can guess there's currently another one
which comes first.

On 03.03.2005 18:31:12 Glen Mazza wrote:
 Happy to see how you have reprioritized your efforts
 over the past two months [1], and much, much for the
 better.
 
 Glen
 
 --- Jeremias Maerki [EMAIL PROTECTED] wrote:
  
  I very much hope so. But it becomes more and more
  apparent that this
  will be the greatest challenge in my programmer's
  life. Wow indeed.
  
  
  Jeremias Maerki
  
  
 
 [1] http://marc.theaimsgroup.com/?l=fop-devm=110495579414655w=2



Jeremias Maerki



Re: Skype-conference on page-breaking?

2005-03-03 Thread Jeremias Maerki
I've bought some SkypeOut credits now. Funny thing: It's cheaper to call
Simon in the Netherlands than to call someone in Lucerne via PSTN. 

Anyway, I'd like to ask if we could hold to a brainstorming conference
call on page breaking either Sunday evening or next Monday or Tuesday
somewhere between 8:00 and 24:00 CET. Of course, on my wish list there
are Simon, Finn and Luca. I'm happy to call either of you on your normal
phone via SkypeOut if you don't have broadband. I hope I can get at
least one of you three on the line. Others are invited to listen in and
contribute, of course. Max. number in the conference is four people with
Skype.

On 01.03.2005 23:31:16 Jeremias Maerki wrote:
 Maybe I could hook you into a Skype conference by using SkypeOut. It's pretty
 cheap to call to the Netherlands. According to the FAQ this is possible.
 
 On 01.03.2005 22:26:50 Simon Pepping wrote:
  On Tue, Mar 01, 2005 at 03:09:46PM +0100, Jeremias Maerki wrote:
   To speed things up could we hold a conference (using Skype, for example)
   to discuss further details on page-breaking? I'd volunteer to sum up any
   results during that discussion for the archives. I have Finn on my Skype
   radar already.
  
  I do not have a broadband connection, and therefore no Skype or other
  VoIP.
 
 
 Jeremias Maerki



Jeremias Maerki



Re: Skype-conference on page-breaking?

2005-03-03 Thread Simon Pepping
On Thu, Mar 03, 2005 at 08:34:54PM +0100, Jeremias Maerki wrote:
 I've bought some SkypeOut credits now. Funny thing: It's cheaper to call
 Simon in the Netherlands than to call someone in Lucerne via PSTN. 
 
 Anyway, I'd like to ask if we could hold to a brainstorming conference
 call on page breaking either Sunday evening or next Monday or Tuesday
 somewhere between 8:00 and 24:00 CET. Of course, on my wish list there
 are Simon, Finn and Luca. I'm happy to call either of you on your normal
 phone via SkypeOut if you don't have broadband. I hope I can get at
 least one of you three on the line. Others are invited to listen in and
 contribute, of course. Max. number in the conference is four people with
 Skype.

Sunday evening is OK. Monday and Tuesday after working hours is OK. I
could be available from 16.00 hrs, but I would prefer after 19.00 hrs
CET. There is no way I can do this at the office.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Simon Pepping
On Thu, Mar 03, 2005 at 08:19:24AM -0700, Victor Mote wrote:
 Jeremias Maerki wrote:
 
  While looking for material on page breaking I found several 
  references to this document:
  
  http://wwwlib.umi.com/dissertations/fullcit/8124134
  
  Does anyone know if it's worth ordering and waiting for it? 
  The unfortunate thing is that they don't seem to have a PDF 
  version that I could download immediately for a reasonable fee.
 
 Wow. This looks like it is very valuable. I have ordered it for my own use,
 and I'll be glad to give you a book review when it arrives to help you
 decide whether it is worthwhile for you or not.

I do not know it. It sounds like I should buy it as well.
 
 Note that Stanford is Knuth's school, the date year is the same as that of
 Chapter 3 of Knuth's Digital Typography, and that the author is the
 co-author of that article. It may be possible to infer the same information
 from looking at the TeX source code. Also, another source of similar
 information would be Volume I of Knuth's Computers and Typesetting, aka
 The TeXbook. It is essentially a commentary on TeX, by Knuth. Chapter 15
 is entitled How TeX Makes Lines into Pages.

Note that The TeXbook is TeX's user guide. Yes, Knuth's users are
quite advanced. It was my first book in the direction of computers,
and one of the most inspirational I have read.

The TeX program is described in 'TeX The Program'. That text is weaved
into the program code according to Knuth's literate programming
system. It can be freely extracted from the program code. A TeX
distribution like TeXLive contains the tools to do this. I intend to
do so soon. If you want to do it yourself, TeXLive is available from
the TUG website, www.tug.org. The TeX source code itself is available
from the CTAN repository, but I fear that you have to do some work to
set up all the tools. It is up to yourself to decide whether knowing
TeX's implementation is useful. It is a best-fit algorithm. There is
no look-ahead. For example, TeX is not able to balance two facing
pages (or two columns on a page, which for TeX is the same). I guess
that a dissertation like that cited above contains much more
information than implemented in TeX.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Victor Mote
Simon Pepping wrote:

 Note that The TeXbook is TeX's user guide. Yes, Knuth's users 
 are quite advanced. It was my first book in the direction of 
 computers, and one of the most inspirational I have read.
 
 The TeX program is described in 'TeX The Program'. That text 

Oops. You're right. That is volume 2 from the same Computers and
Typesetting series.

 set up all the tools. It is up to yourself to decide whether 
 knowing TeX's implementation is useful. It is a best-fit 
 algorithm. There is no look-ahead. For example, TeX is not 
 able to balance two facing pages (or two columns on a page, 
 which for TeX is the same). I guess that a dissertation like 
 that cited above contains much more information than 
 implemented in TeX.

I'm not sure. The general TeX page-breaking algorithm is discussed in the
paragraphs before Appendix A of Chapter 3 of Digital Typography. The
general box/glue/penalty model is used, but only the current page is
considered. So I think the difference between best-fit and total-fit (as
described here anyway) is the amount of look-ahead itself, not so much the
algorithm. This is why I thought Finn's (IIRC) idea of a variable look-ahead
makes sense. A look-ahead of zero pages is a best-fit, a look-ahead of all
pages is a total-fit. But the algorithm is the same.

Anyway, I agree that the paper is probably the best source, but wanted to
give Jeremias some options.

Victor Mote



RE: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Victor Mote
Victor Mote wrote:

 Oops. You're right. That is volume 2 from the same Computers 
 and Typesetting series.

Er, it is actually volume B.

Victor Mote



Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Vincent Hennebert
Hi Fop Team,
Simon Pepping a écrit :
On Thu, Mar 03, 2005 at 08:19:24AM -0700, Victor Mote wrote:
Jeremias Maerki wrote:
While looking for material on page breaking I found several 
references to this document:

http://wwwlib.umi.com/dissertations/fullcit/8124134
Does anyone know if it's worth ordering and waiting for it? 

By looking for this reference, I found the following article:
www.pi6.fernuni-hagen.de/publ/tr234.pdf
It's entitled 'On the Pagination of Complex Documents' (actually it's also 
referencing Plass). I've read parts of this article and it seems interesting. 
It's well written, quite easy to understand and provides a better algorithm than 
TeX's one. And it's also more recent. It doesn't agree with Plass about the 
NP-hard problem of page layout. Indeed that depends on the formula we are using 
to estimate the badness of a layout. It proposes another formula which seems 
more reasonable and better corresponds to a reader's expectations.
Perhaps it could provide a good basis? I don't have Knuth's 'Digital Typography' 
(I'm considering purchasing it). It may be worthwhile to compare this article 
with what is in the book.

The TeX program is described in 'TeX The Program'. That text is weaved
into the program code according to Knuth's literate programming
system. It can be freely extracted from the program code.

I've already done it, and AFAICT it's not easy to find the way in all the TeX 
stuff (actually nothing is easy with TeX ;-)). I can send you the pdf file if 
you want, feel free to ask me.
Be warned, though: it's a 535 pages (!) document that entirely describes the TeX 
program. I've started looking into it, and, well, it's rather cryptic. It's very 
close to the implementation, it seems to be difficult to get a general idea of 
what it's doing. But I'll investigate a bit more.
And, as Simon wrote, TeX is excellent in line-breaking but not as good in 
page-breaking. It was implemented in the early 80's when memory was expensive.
However, it was written with typographic quality in mind, and that's why it may 
be a good idea to try getting some hints from it.

Vincent


Re: Plass, Michael Frederick: Optimal Pagination Techniques for Automatic Typesetting Systems

2005-03-03 Thread Renaud Richardet
Vincent Hennebert wrote:
 By looking for this reference, I found the following article:
 www.pi6.fernuni-hagen.de/publ/tr234.pdf
 It's entitled 'On the Pagination of Complex Documents' (actually it's also
 referencing Plass). 

There's another article, where the top level of the algorithm is
presented (page 8).
http://www.pi6.fernuni-hagen.de/publ/tr205.pdf

Those 2 articles are summaries of a book. The link to command the book
can be found at the bottom of
http://web.informatik.uni-bonn.de/I/research/Pagination/

Renaud