Re: [NTG-context] distributed / parallel TeX?

2008-12-17 Thread Hans Hagen

Lars Huttar wrote:

On 12/16/2008 3:15 PM, luigi scarso wrote:


On Tue, Dec 16, 2008 at 9:08 AM, Taco Hoekwater t...@elvenkind.com
mailto:t...@elvenkind.com wrote:


Hi Lars,


Lars Huttar wrote:



So the question comes up, can TeX runs take advantage of
parallelized or
distributed processing?


No. For the most part, this is because of another requisite: for
applications to make good use of threads, they have to deal with a
problem that can be parallelized well. And generally speaking,
typesetting  does not fall in this category. A seemingly small change
on page 4 can easily affect each and every page right to the end
of the document.


Also
3.11 Theory of page breaking
www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/TeX%20LaTeX%20
http://www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/TeX%20LaTeX%20*course*.pdf


Certainly that is a tough problem (particularly in regard to laying out
figures near the references to them). But again, if you can break down
the document into chunks that are fairly independent of each other (and
you almost always can for large texts), this problem seems no worse for
distributed processing than for sequential processing. For example, the
difficult part of laying out figures in Section 1 is confined to Section
1; it does not particularly interact with Section 2. References in
Section 2 to Section 1 figures are going to be relatively distant from
those figures regardless of page breaking decisions. Thus the difficult
problem of page breaking is reduced to the sequential-processing case...
still a hard problem, but one that can be attacked in chunks. Indeed,
the greater amount of CPU time per page that is made available through
distributed processing may mean that the algorithms can do a better job
of page breaking than through sequential processing.


you need to keep an eye on where tex spends its time on, and much is 
related to loading fonts, reading files, saving output, etc and with 
multiple threads one would have to coordinate that and make sure the 
time spent on it does not become larger overall


for instance, in your document making these large tables takes a while 
only because bTABLE is not that fast, so when at some point i can redo 
part of such mechanisms in lua we might gain way more runtime than by 
running distributed


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-17 Thread Taco Hoekwater


Martin Schröder wrote:
 2008/12/16 Lars Huttar lars_hut...@sil.org:
 Good point... although doesn't the page optimization feed back into
 paragraph layout?
 
 No. :-(

But from Lars' POV, that is good :)

There are some interesting ideas in this discussion, but with
the current state of the code base all of this will be exceedingly
difficult (especially because of all the synchronisation issues).

Unless someone wants to work on this idea him/herself (and that
would be great, there are not nearly enough people working on TeX
development!), you could remind me, say, two years from now?

Best wishes,
Taco






___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-17 Thread luigi scarso

 you need to keep an eye on where tex spends its time on, and much is
 related to loading fonts, reading files, saving output, etc and with
 multiple threads one would have to coordinate that and make sure the time
 spent on it does not become larger overall

 for instance, in your document making these large tables takes a while only
 because bTABLE is not that fast, so when at some point i can redo part of
 such mechanisms in lua we might gain way more runtime than by running
 distributed

 Hans


If you are under Linux
you can try /dev/shm
http://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-17 Thread Lars Huttar
Thanks, everybody, for the discussion on running TeX distributed / in
parallel.
I am much educated about the state of the art. :-)

Summary ...

- There is plenty of optimization that normally can be done. If a
ConTeXt run is taking a really long time, chances are that something is
not being done according to the design.

- For most (current) purposes, documents are small enough and ConTeXt is
fast enough that the effort to automate distribution of typesetting runs
may not be worthwhile. On the other hand, the usage of TeX might expand
if greater throughput were available.

- However, as things stand now, one can always divide documents up by
hand, typeset the parts independently, and stitch them back together
using tools such as divert/undivert. One can even design a document with
the spec that the canonical typesetting process is to typeset the
sections independently; then the sections can never affect each other,
except for explicitly added inter-section effects like page reference
updates.

If you're not aware of MarkMail, it's a handy place to browse / search
archives of mailing lists. This thread can be found at
http://markmail.org/search/?q=ntg+context+distributed+parallel

On 12/17/2008 2:47 AM, Taco Hoekwater wrote:
 There are some interesting ideas in this discussion, but with
 the current state of the code base all of this will be exceedingly
 difficult (especially because of all the synchronisation issues).

 Unless someone wants to work on this idea him/herself (and that
 would be great, there are not nearly enough people working on TeX
 development!), you could remind me, say, two years from now?

Sure. Thank you for your interest.

I wasn't asking for someone to implement new features for this, though I
would be happy to see it happen if it is worthwhile for the community.

As Dr Dobb's says, Single core systems are history
(http://www.ddj.com/hpc-high-performance-computing/207100560). Software
that can take advantage of multiple cores (or threads, or distributed
nodes) will continue to scale. Of course some effort, and often some
adjustment, is necessary to enable programs to effectively use parallelism.

I'll create a page at http://wiki.contextgarden.net/Parallel summarizing
this discussion if that's OK.

Regards,
Lars

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Taco Hoekwater


Hi Lars,

Lars Huttar wrote:

Hello,

We've been using TeX to typeset a 1200-page book, and at that size, the
time it takes to run becomes a big issue (especially with multiple
passes... about 8 on average). It takes us anywhere from 80 minutes on
our fastest machine, to 9 hours on our slowest laptop.


You should not need an average of 8 runs unless your document is
ridiculously complex and I am curious what you are doing (but that
is a different issue from what you are asking).


So the question comes up, can TeX runs take advantage of parallelized or
distributed processing? 


No. For the most part, this is because of another requisite: for
applications to make good use of threads, they have to deal with a
problem that can be parallelized well. And generally speaking,
typesetting  does not fall in this category. A seemingly small change
on page 4 can easily affect each and every page right to the end
of the document.

About the only safe things that can be threaded is the reading of
resources (images and fonts) and, mostly because of the small gain,
nobody has been working on that that I know of.


parallel pieces so that you could guarantee that you would get the same
result for section B whether or not you were typesetting the whole book
at the same time?


if you are willing to promiss yourself that all chapters will be exactly
20 pages - no more, no less - they you can split the work off into
separate job files yourself and take advantage of a whole server
farm. If you can't ...

Best wishes,
Taco
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Hans Hagen

Lars Huttar wrote:

Hello,

We've been using TeX to typeset a 1200-page book, and at that size, the
time it takes to run becomes a big issue (especially with multiple
passes... about 8 on average). It takes us anywhere from 80 minutes on
our fastest machine, to 9 hours on our slowest laptop.


often there are ways to optimize the document style

btw, in my laptop mk.tex which loads a bunch of real big fonts (esp punk 
slows down) does 10 pages/second (242 pages currently) so your setu opif 
probably not that efficient



So the question comes up, can TeX runs take advantage of parallelized or
distributed processing? As many in the computer industries are aware,
processor speed (clock rate) has plateaued; it is not going to continue
rising at the rate it had been. Hence the common move to dual-core,
quad-core, etc. machines. But applications in general cannot take
advantage of multiple cores to speed their work unless they are
architected to take advantage of them.

We googled around a bit but were surprised not to find any real
references to efforts at running TeX in parallel or on distributed
networks or clusters. Wouldn't this be something that a lot of people
would find useful? Or does everyone only use TeX for typesetting short
papers?


it all depends on what you process; for simple docs tex is rather fast:

\starttext

\setuplayout[page]

\dorecurse{1000}{\null\page}

\stoptext

such a run takes in mkiv:

5.944 seconds, 1000 processed pages, 1000 shipped pages, 168.237 
pages/second



Sure, you can use manual tricks to speed up TeX processing.
You can comment out sections of a document, or select them via modes.
But then you have to remember where you did the commenting out, so you
can reverse it. And you have no guarantees as to whether the
inclusion/exclusion of section B will affect the layout of section C or not.


often it's inefficient font calls that slow down the job (or big 
graphics that one can skip including in a first pass)



Wouldn't it be nice if TeX (or a TeX wrapper, or macro package, or
typesetting system) could take care of this for you?


mayne in the future we can do some parallization

also, i'm thinking of 'one run with prerolls' but it has no high 
priority (maybe i'll do it when i need it for a project)



What if you had a language -- or a few extensions to existing languages
-- to give your typesetting engine hints or commands about where to
split up your long document into fairly-independent chunks? What if you
designed your document specifically to be typeset in independent,
parallel pieces so that you could guarantee that you would get the same
result for section B whether or not you were typesetting the whole book
at the same time?


there are quite some dependencies of pages on each other (take cross 
refs and how they might influence a next run)



What if the typesetting system automated the stitching-back-together
process of the chunks, gathering page reference info from each chunk to
inform the next iteration of typesetting the other chunks?


this is an option when you have to reprocess parts of the documents often


Has anyone been working on this already? It seems like it must have been
discussed, but I don't know where to go to look for that discussion.


if i were you i'd first look into optimizing the style

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Hans Hagen

Aditya Mahajan wrote:

On Tue, 16 Dec 2008, Hans Hagen wrote:


Lars Huttar wrote:

Hello,

We've been using TeX to typeset a 1200-page book, and at that size, the
time it takes to run becomes a big issue (especially with multiple
passes... about 8 on average). It takes us anywhere from 80 minutes on
our fastest machine, to 9 hours on our slowest laptop.


often there are ways to optimize the document style


In Latex, there is a package called mylatex, which allows you to create 
a format consisting of your preamble. Then you can call latex with your 
own format and this speeds up things.


not on this document since the processing takes way more than the startup

in jelle's doc it's the many many many local font definitions and the 
some 800 metapost graphics that are the culprit


- define fonts beforehand
- use unique mpgraphic when possible

i changes the definitions a bit and now get 5 pages per second on my 
laptop in luatex; xetex processes the pages a bit faster but spends way 
more time on the mp part


(of course 1450 pages of columnsets and multipage bTABLE's also cost a 
bit of time)


This approach did not provide a significant speed improvement in latex 
for me, and I don't know whether it will do so in context. Hans and 
Taco, do you think that creating a personal format and possibly also 
dumping some font related info could provide a tradeoff between 
processing speed and disk space?


it depends, you might win maybe a second per run when on startup in 
cases where you have thousands of small runs but often using a small 
tree (like the minimals) pays of more


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Aditya Mahajan

On Tue, 16 Dec 2008, Hans Hagen wrote:


Lars Huttar wrote:

Hello,

We've been using TeX to typeset a 1200-page book, and at that size, the
time it takes to run becomes a big issue (especially with multiple
passes... about 8 on average). It takes us anywhere from 80 minutes on
our fastest machine, to 9 hours on our slowest laptop.


often there are ways to optimize the document style


In Latex, there is a package called mylatex, which allows you to create a 
format consisting of your preamble. Then you can call latex with your own 
format and this speeds up things.


This approach did not provide a significant speed improvement in latex for 
me, and I don't know whether it will do so in context. Hans and Taco, do 
you think that creating a personal format and possibly also dumping some 
font related info could provide a tradeoff between processing speed and 
disk space?


Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Hans Hagen

Lars Huttar wrote:


We have close to 7000 mpgraphics, and they add about 15 minutes to the
run time.


most of them are the same so reusing them made sense


But the run time was already quite long before we started using those.


- define fonts beforehand


OK, we will look into this. I'm sure Jelle knows about this but I'm a
noob. I'm pretty sure we are not *loading* fonts every time, but maybe
we're scaling fonts an unnecessary number of times.
For example, we have the following macro, which we use thousands of times:
\def\LN#1{{\switchtobodyfont[SansB,\LNfontsize]{#1}}}


indeed this will define the scaled ones again and again (whole sets of 
them since you use a complete switch); internall tex reuses them but it 
only know so when they're defined



Would it help much to instead use
\definefont[SansBLN][... at \LNfontsize]
and then
\def\LN#1{{\SansBLN{#1}}}
?


indeed:

\definefont[SansBLN][... at \LNfontsize]

but no extra { } needed:

\def\LN#1{{\SansBLN#1}}


- use unique mpgraphic when possible


I would be interested to know if this is possible in our situation. Most
of our mpgraphics are due to wanting thick-and-thin or single-and-double
borders on tables, which are not natively supported by the ConTeXt table
model.


i sent jelle the patched files


The advice I received said to define each mpgraphic using
\startuseMPgraphic (we have about 18 of these), associate them with
overlays using \defineoverlay (again, we have 18), and then use them in
table cells using statements like
\setupTABLE[c][first][background={LRtb}]
Empirically, this seems to end up using one mpgraphic per table cell,
hence our thousands of mpgraphics. I don't know why a new mpgraphic
would be created for each cell. Can someone suggest a way to avoid this?


metafun manual: unique mp graphics


i changes the definitions a bit and now get 5 pages per second on my
laptop in luatex; xetex processes the pages a bit faster but spends way
more time on the mp part


My last run gave about 0.25 pages per second on our fastest server, when
taking into account multiple passes; that comes out to about 2 pps for
--once.


the patched files do 5-10 pps on my laptop (was  1 sec pp) so an 
improvement factor of at least 5 is possible


there are probably other optimizations possible but i cannot spent too 
much time on it


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Lars Huttar
On 12/16/2008 2:08 AM, Taco Hoekwater wrote:
 
 Hi Lars,
 
 Lars Huttar wrote:
 Hello,

 We've been using TeX to typeset a 1200-page book, and at that size, the
 time it takes to run becomes a big issue (especially with multiple
 passes... about 8 on average). It takes us anywhere from 80 minutes on
 our fastest machine, to 9 hours on our slowest laptop.
 
 You should not need an average of 8 runs unless your document is
 ridiculously complex and I am curious what you are doing (but that
 is a different issue from what you are asking).
 
 So the question comes up, can TeX runs take advantage of parallelized or
 distributed processing? 
 
 No. For the most part, this is because of another requisite: for
 applications to make good use of threads, they have to deal with a
 problem that can be parallelized well. And generally speaking,
 typesetting  does not fall in this category. A seemingly small change
 on page 4 can easily affect each and every page right to the end
 of the document.

Thank you for your response.

Certainly this is true in general and in the worst case, as things stand
currently. But I don't think it has to be that way. The following could
greatly mitigate that problem:

- You could design your document *specifically* to make the parts
independent, so that the true and authoritative way to typeset them is
to typeset the parts independently. (You can do this part now without
modifying TeX at all... you just have the various sections' .tex files
input common headers / macro defs.) Then, by definition, a change in
one section cannot affect another section (except for page numbers, and
possibly left/right pages, q.v. below).

- Most large works are divisible into chunks separated by page breaks
and possibly page breaks that force a recto. This greatly limits the
effects that any section can have on another. The division (chunking)
of the whole document into fairly-separate parts could either be done
manually, or if there are clear page breaks, automatically.

- The remaining problem, as you noted, is how to fix page references
from one section to another. Currently, TeX resolves forward references
by doing a second (or third, ...) pass, which uses page information from
the previous pass. The same technique could be used for resolving
inter-chunk references and determining on what page each chunk should
start. After one pass on of the independent chunks (ideally performed
simultaneously by separate processing nodes), page information is sent
from each node to a coordinator process. E.g. the node that processed
section two tells the coordinator that chapter 11 starts 37 pages after
the beginning of section two. The coordinator knows in what sequence the
chunks are to be concatenated, thanks to a config file. It uses this
information together with info from each of the nodes to build a table
of what page each chunk should start on, and a table giving the absolute
page number of each page reference. If pagination has changed, or is
new, this info is sent back to the various nodes for another round of
processing.

If this distributed method of typesetting a document takes 1 additional
iteration compared to doing it in series, but you get to split the
document into say 5 roughly equal parts, you could presumably get the
job done a lot quicker in spite of the extra iteration.

This is a crude description but hopefully the idea is clear enough.

 parallel pieces so that you could guarantee that you would get the same
 result for section B whether or not you were typesetting the whole book
 at the same time?
 
 if you are willing to promiss yourself that all chapters will be exactly
 20 pages - no more, no less - they you can split the work off into
 separate job files yourself and take advantage of a whole server
 farm. If you can't ...

Yes, the splitting can be done manually now, and when the pain point
gets high enough, we do some manual separate TeX runs.

However, I'm thinking that for large works, there is enough gain to be
had that it would be worth systematizing the splitting process and
especially the recombining process, since the later is more error-prone.

I think people would do it a lot more if there were automation support
for it. I know we would.

But then, maybe our situation of having a large book with dual columns
and multipage tables is not common enough in the TeX world.
Maybe others who are typesetting similar books just use commercial
WYSIWYG typesetting tools, as we did in the previous edition of this book.

Lars
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net

Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Lars Huttar
On 12/16/2008 11:37 AM, Hans Hagen wrote:
 Lars Huttar wrote:
 
 We have close to 7000 mpgraphics, and they add about 15 minutes to the
 run time.
 
 most of them are the same so reusing them made sense
 
 But the run time was already quite long before we started using those.

 - define fonts beforehand

 OK, we will look into this. I'm sure Jelle knows about this but I'm a
 noob. I'm pretty sure we are not *loading* fonts every time, but maybe
 we're scaling fonts an unnecessary number of times.
 For example, we have the following macro, which we use thousands of
 times:
 \def\LN#1{{\switchtobodyfont[SansB,\LNfontsize]{#1}}}
 
 indeed this will define the scaled ones again and again (whole sets of
 them since you use a complete switch); internall tex reuses them but it
 only know so when they're defined
 
 Would it help much to instead use
 \definefont[SansBLN][... at \LNfontsize]
 and then
 \def\LN#1{{\SansBLN{#1}}}
 ?
 
 indeed:
 
 \definefont[SansBLN][... at \LNfontsize]
 
 but no extra { } needed:
 
 \def\LN#1{{\SansBLN#1}}

Thanks, we will try this.
(Jelle, since you have worked with this a lot longer than I have, please
stop me if you have concerns about my making this sort of change.)

 - use unique mpgraphic when possible

 I would be interested to know if this is possible in our situation. Most
 of our mpgraphics are due to wanting thick-and-thin or single-and-double
 borders on tables, which are not natively supported by the ConTeXt table
 model.
 
 i sent jelle the patched files

OK, I'll look to hear from him. Are these patches to support these kinds
of borders on tables, thus no longer needing to use MPgraphics?

 The advice I received said to define each mpgraphic using
 \startuseMPgraphic (we have about 18 of these), associate them with
 overlays using \defineoverlay (again, we have 18), and then use them in
 table cells using statements like
 \setupTABLE[c][first][background={LRtb}]
 Empirically, this seems to end up using one mpgraphic per table cell,
 hence our thousands of mpgraphics. I don't know why a new mpgraphic
 would be created for each cell. Can someone suggest a way to avoid this?
 
 metafun manual: unique mp graphics

Great...
I converted our useMPgraphics to uniqueMPgraphics. This reduced our
number of mpgraphics from 7000 to 800!

Unfortunately the result doesn't look quite right... but since we may
not need to use mpgraphics anyway thanks to your patches, I'll hold off
on debugging the result.

 i changes the definitions a bit and now get 5 pages per second on my
 laptop in luatex; xetex processes the pages a bit faster but spends way
 more time on the mp part

 My last run gave about 0.25 pages per second on our fastest server, when
 taking into account multiple passes; that comes out to about 2 pps for
 --once.
 
 the patched files do 5-10 pps on my laptop (was  1 sec pp) so an
 improvement factor of at least 5 is possible
 
 there are probably other optimizations possible but i cannot spent too
 much time on it

Thanks for all your help thus far.

Lars

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Mojca Miklavec
On Tue, Dec 16, 2008 at 6:25 PM, Lars Huttar wrote:
 Most
 of our mpgraphics are due to wanting thick-and-thin or single-and-double
 borders on tables, which are not natively supported by the ConTeXt table
 model.
 The advice I received said to define each mpgraphic using
 \startuseMPgraphic (we have about 18 of these), associate them with
 overlays using \defineoverlay (again, we have 18), and then use them in
 table cells using statements like
\setupTABLE[c][first][background={LRtb}]

In that case I would suggest you to use vrules/hrules to achieve the
same. As long as you don't need to use complicated graphics (like Hans
uses random frames) this should be doable and would speed up the
process enormously. What's the compile time if you omit the MP
graphics?

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Martin Schröder
2008/12/16 Lars Huttar lars_hut...@sil.org:
 - You could design your document *specifically* to make the parts
 independent, so that the true and authoritative way to typeset them is
 to typeset the parts independently. (You can do this part now without
 modifying TeX at all... you just have the various sections' .tex files
 input common headers / macro defs.) Then, by definition, a change in
 one section cannot affect another section (except for page numbers, and
 possibly left/right pages, q.v. below).

True. Also with TeX if your paragraphs are independent of each other
(i.e. they don't include references to others), they could the typeset
in parallel and then handed over to the page builder.

 - Most large works are divisible into chunks separated by page breaks
 and possibly page breaks that force a recto. This greatly limits the
 effects that any section can have on another. The division (chunking)
 of the whole document into fairly-separate parts could either be done
 manually, or if there are clear page breaks, automatically.

pdfTeX 1.50 knows about the page diversions (analogue to m4's divert
and undivert). They have a lot of potential.

 page number of each page reference. If pagination has changed, or is
 new, this info is sent back to the various nodes for another round of
 processing.

Hopefully stopping at some point. If you use something like varioref,
you can end with infinite circles. :-)

Best
   Martin
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Lars Huttar
On 12/16/2008 3:31 PM, Martin Schröder wrote:
 2008/12/16 Lars Huttar lars_hut...@sil.org:
 - You could design your document *specifically* to make the parts
 independent, so that the true and authoritative way to typeset them is
 to typeset the parts independently. (You can do this part now without
 modifying TeX at all... you just have the various sections' .tex files
 input common headers / macro defs.) Then, by definition, a change in
 one section cannot affect another section (except for page numbers, and
 possibly left/right pages, q.v. below).
 
 True. Also with TeX if your paragraphs are independent of each other
 (i.e. they don't include references to others), they could the typeset
 in parallel and then handed over to the page builder.

Good point... although doesn't the page optimization feed back into
paragraph layout?

 - Most large works are divisible into chunks separated by page breaks
 and possibly page breaks that force a recto. This greatly limits the
 effects that any section can have on another. The division (chunking)
 of the whole document into fairly-separate parts could either be done
 manually, or if there are clear page breaks, automatically.
 
 pdfTeX 1.50 knows about the page diversions (analogue to m4's divert
 and undivert). They have a lot of potential.

Sounds useful. It's impressive if you can get a correct table of
contents in the first run (says
http://www.gust.org.pl/BachoTeX/2008/presentations/ms/handout.pdf)

 page number of each page reference. If pagination has changed, or is
 new, this info is sent back to the various nodes for another round of
 processing.
 
 Hopefully stopping at some point. If you use something like varioref,
 you can end with infinite circles. :-)

But this is just a problem of typesetting with TeX in general, not
particular to parallel/distributed typesetting, right?
IIRC, Knuth even says in the TeXbook that a really pathological case
might never stabilize.

Lars
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Martin Schröder
2008/12/16 Lars Huttar lars_hut...@sil.org:
 Good point... although doesn't the page optimization feed back into
 paragraph layout?

No. :-(

Best
   Martin
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread luigi scarso
On Tue, Dec 16, 2008 at 9:08 AM, Taco Hoekwater t...@elvenkind.com wrote:


 Hi Lars,

 Lars Huttar wrote:

 Hello,

 We've been using TeX to typeset a 1200-page book, and at that size, the
 time it takes to run becomes a big issue (especially with multiple
 passes... about 8 on average). It takes us anywhere from 80 minutes on
 our fastest machine, to 9 hours on our slowest laptop.


 You should not need an average of 8 runs unless your document is
 ridiculously complex and I am curious what you are doing (but that
 is a different issue from what you are asking).

  So the question comes up, can TeX runs take advantage of parallelized or
 distributed processing?


 No. For the most part, this is because of another requisite: for
 applications to make good use of threads, they have to deal with a
 problem that can be parallelized well. And generally speaking,
 typesetting  does not fall in this category. A seemingly small change
 on page 4 can easily affect each and every page right to the end
 of the document.


Also
3.11 Theory of page breaking
www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/TeX%20LaTeX%20*course*.pdf


-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] distributed / parallel TeX?

2008-12-16 Thread Lars Huttar
On 12/16/2008 3:15 PM, luigi scarso wrote:
 
 
 On Tue, Dec 16, 2008 at 9:08 AM, Taco Hoekwater t...@elvenkind.com
 mailto:t...@elvenkind.com wrote:
 
 
 Hi Lars,
 
 
 Lars Huttar wrote:
...
 So the question comes up, can TeX runs take advantage of
 parallelized or
 distributed processing?
 
 
 No. For the most part, this is because of another requisite: for
 applications to make good use of threads, they have to deal with a
 problem that can be parallelized well. And generally speaking,
 typesetting  does not fall in this category. A seemingly small change
 on page 4 can easily affect each and every page right to the end
 of the document.
 
 
 Also
 3.11 Theory of page breaking
 www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/TeX%20LaTeX%20
 http://www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/TeX%20LaTeX%20*course*.pdf

Certainly that is a tough problem (particularly in regard to laying out
figures near the references to them). But again, if you can break down
the document into chunks that are fairly independent of each other (and
you almost always can for large texts), this problem seems no worse for
distributed processing than for sequential processing. For example, the
difficult part of laying out figures in Section 1 is confined to Section
1; it does not particularly interact with Section 2. References in
Section 2 to Section 1 figures are going to be relatively distant from
those figures regardless of page breaking decisions. Thus the difficult
problem of page breaking is reduced to the sequential-processing case...
still a hard problem, but one that can be attacked in chunks. Indeed,
the greater amount of CPU time per page that is made available through
distributed processing may mean that the algorithms can do a better job
of page breaking than through sequential processing.

Lars

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___