Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-28 Thread Karl Berry
more active in supporting TeX4ht,

Help is always good.  There are plenty of open bugs at
  https://puszcza.gnu.org.ua/bugs/?group=tex4ht=browse=open
That's probably the simplest place to start.

Thanks,
Karl


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-28 Thread Boris Veytsman
NMA> From: "Nasser M. Abbasi" 
NMA> Date: Mon, 28 Mar 2016 03:20:50 -0500


NMA> I just finished new timings on the same above file, after removing
NMA> the package titlesec which was the cause of the slowdown (thanks for
NMA> finding it goes toJagath AR​)


I wonder why titlesec take so much time.  It is possible to add
profiling messages using write16 to critical points in titlesec.  

-- 
Good luck

-Boris

"I changed my headlights the other day. I put in strobe lights instead! Now
when I drive at night, it looks like everyone else is standing still ..."
-- Steven Wright


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-28 Thread Nasser M. Abbasi

On 3/26/2016 2:21 AM, Nasser M. Abbasi wrote:

I put the latex file
and all the needed include files I use and the .cfg and main.mk
and the command in one zip file. Here it is, in this folder:

http://12000.org/tmp/032616/

There is file call compile.sh which has this line:

make4ht --lua -u -c ./nma.cfg -e ./main.mk4 report.tex "htm,3,pic-align,notoc*"




I just finished new timings on the same above file, after removing
the package titlesec which was the cause of the slowdown (thanks for
finding it goes toJagath AR​)

Thought to share the new timings: (all with TL 2015)

---
new Linux pc: 24 GB RAM, intel i7-6700, 64 bits:   4 minutes
older Linux pc: 8 GB RAM, intel i7-930:6 minutes
Windows 7 pc 64 bit, VBox: 16 GB RAM, intel i7-3930k:  17 minutes
Windows 7 pc 64 bit, CYGWIN: 16 GB RAM, intel i7-3930k: 54 minutes


In all of these, the slowest part of tex4ht building phases,
happens at the call to dvisvgm, here:

.
Make4ht: dvisvgm -v1 -n -c 1.15,1.15 -p 1- report.idv
..

I was surprised cygwin took much more actually now, but most
of the time with the cygwin test was on the above line, about
80% of the build time was on the dvisvgm call ! I was reading that
cygwin disk IO is slow.

--Nasser


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-27 Thread Jagath AR
On 27 March 2016 at 04:50, Nasser M. Abbasi  wrote:

> On 3/26/2016 5:44 PM, Karl Berry wrote:
>
>>
>>  I have many many latex files this large
>>
>> Is the one you provided already one of the smaller ones?  The smaller
>> the file that still exhibits the problem, the easier to debug.
>>
>>
> The file I put a link to is an average size Latex file. I have some larger.
> I picked this one since it showed the problem clearly.
>
>  given that lualatex takes one hr or so.
>>
>> Oh, lualatex is another important part of the story.  LuaTeX is already
>> significantly slower than standard TeX (or XeTeX), and depending on what
>> your document is doing, we may be hitting some kind of new/unusual
>> slowdown that is specific to luatex.  Must you use luatex?
>>
>>
> Sorry, I meant lualatex takes one hr for _all_ the files (I have about
> 50-60 files like this one for this one test). I did not mean it
> takes one hr for this one file.
>
> Just to be clear: For the specific file I posted
>
>   http://12000.org/tmp/032616/
>
> pdflates and lualatex take about 2-3 minutes to compile the above file,
> on native linux on new PC.  Even on Vbox and cygwin, pdflatex and lualatex
> as really fast. Not more than 4-5 minutes at most on this file.
>

​Hi Nasser,
I have TL 2013 on i5 machine with 8GB RAM on Ubuntu 12.04.

I have downloaded the files and tested by unloading some packages you have
provided and found that the package "*titlesec*" is slowing down the
compilation when you go through tex4ht (make4ht). I have commented it on
*my_core.tex* and executed your *compile.sh. *The process is now faster.
Could you please check this.

​I have not gone in detail on what is causing the problem in '*titlesec'*
package on tex4ht.​

​Regards
Jagath AR​


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-27 Thread Karl Berry
 make:htlatex()
 make:htlatex()
 make:htlatex()

P.S. The texi2dvi script implements one way to check how many runs are
needed: it compares auxiliary files and keeps running TeX until they
stabilize.  I suspect latexmk and others do the same thing, though I've
never looked at them.  So it is doable, though far from trivial.  -k


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-27 Thread Karl Berry
What are these rules? 

I just explained it.

(that is why we need design documents)

There's no use in continuing to repeat this wish.  We all agree.  What's
needed is more people to do work, not more work to be done!  -k


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-27 Thread Nasser M. Abbasi

On 3/27/2016 5:13 PM, Karl Berry wrote:



 I call make4ht to compile the file

I don't know anything about make4ht, that's Michal's bag ... Perhaps it
is already smart enough to avoid multiple runs when it's not necessary.
(I'm having deja vu, did we already discuss this, Michal?)  -k



The file

/usr/local/texlive/2015/texmf-dist/scripts/make4ht/make4ht

has this:


if make:length() < 1 then
if mode == "draft" then
make:htlatex()
else
make:htlatex()
make:htlatex()
make:htlatex()
end
end
--

So, unless the mode is "draft", it calls latex 3 times all the time.

It will be great if one can find out how to decide to call it 3
times or 2 or one time. Since tex4ht is very slow on large files,
any saving in time is very useful. This can cut down the compile from 5 hrs
on typical file I have to may be 2 hrs!  but one needs to know the
rules to use to decide on the number of times to call it.

What are these rules? (that is why we need design documents)

--Nasser


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-27 Thread Karl Berry
But I do not know how to decide if I need one run or two or three?

You need three runs if you are starting from scratch.
After the first time, in general, you only need one run unless you are
modifying the things that go into the cross-references (section names,
labels, bib entries, etc.).  

Ordinarily it's not worth thinking about, since the runs are fast
enough.  I only mention it here because if you're repeatedly doing three
five-hour runs when you only need to do one, that would clearly be
undesirable.

I call make4ht to compile the file

I don't know anything about make4ht, that's Michal's bag ... Perhaps it
is already smart enough to avoid multiple runs when it's not necessary.
(I'm having deja vu, did we already discuss this, Michal?)  -k



Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Karl Berry
Hi Nasser,

TL is more optimized for native Linux vs. cygwin.

Just to remark: TL specifically is not "optimized" for any particular
platform (all binaries are built natively).  I think the difference you
are seeing here is an inevitable consequence of running a
resource-intensive job on an emulation layer (cygwin) vs. a native layer
(gnu/linux).  (As for native Windows, it is fundamentally inefficient,
so I'm not surprised it is slow too.  Cygwin or vbox is thus the worst
of both worlds.)

buying new PC and installing Linux on it just in the hope

Wow.  I suspect you are the only person in the world buying hardware to
placate tex4ht!

then it starts to slow down, the higher the number becomes

I think we need some kind of profiling of the TeX run to find the facts.
I don't have an easy recipe at hand.  (And I'm currently trying to get
the next TUGboat out the door, plus prepare for the TL pretest, so it's
going to be hard to devote significant time to this for a while,
unfortunately ...)

But the issue is, pdflatex and lualatex take about 5 minutes
on the same file to compile it to pdf !

Ok, so let's consider PDF first, since that is simpler to think about
than HTML.

I have many many latex files this large

Is the one you provided already one of the smaller ones?  The smaller
the file that still exhibits the problem, the easier to debug.

given that lualatex takes one hr or so.

Oh, lualatex is another important part of the story.  LuaTeX is already
significantly slower than standard TeX (or XeTeX), and depending on what
your document is doing, we may be hitting some kind of new/unusual
slowdown that is specific to luatex.  Must you use luatex?

Finally, is there a document that describes the passes/process
that tex4ht uses to compile to HTML at some high level?

The htlatex script is six lines long, and is the clearest possible
summary of what is run.  I'll omit the TeX gobbeldygook that Eitan
uses.

#!/bin/sh
latex $5 ...
latex $5 ...
latex $5 ...
tex4ht -f/$1  -i~/tex4ht.dir/texmf/tex4ht/ht-fonts/$3
t4ht -f/$1 $4

I assume Michal's make4ht is fundamentally equivalent.

As CVR says, the reason for the three latex runs is simply to resolve
references.  Thus if you are repeatedly running the same doc, with all
aux files already in place, one run would suffice.


For sure, a design document, among many others, would be extremely
desirable, not to mention many updates to the code, not to mention a new
release, not to mention ...  What's fundamentally needed are more
volunteers with time and ability to help develop and document this
highly complex system!

karl


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Radhakrishnan CV
On Sat, Mar 26, 2016 at 3:38 PM, Nasser M. Abbasi  wrote:

>
> The zip file is much larger, about 270 MB :( But now it
> contains everything to compile this one latex file.
>

​I have downloaded the previous archive. Will download this if needed.​ I
will temporarily suspend svg generation and see if tex4ht runs faster. BTW,
please don't expect a quick response from me. Will try to do as quickly as
possible.

​Best regards​
-- 
Radhakrishnan
River Valley



Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Nasser M. Abbasi



Thanks for the offer to look into it. I put the latex file
and all the needed include files I use and the .cfg and main.mk
and the command in one zip file. Here it is, in this folder:

http://12000.org/tmp/032616/



opps. I am sorry, I forgot to also inlcude the svg images for use
by tex4ht in the zip file earlier. (I use pdf2svg to convert pdf
image files to svg files for use in HTML in the \includegraphics call).

If someone happened to have downloded the zip file allready, please
delete it and obtain the updated zip file I just uploaded now:

http://12000.org/tmp/032616/

The zip file is much larger, about 270 MB :( But now it
contains everything to compile this one latex file.


thank you,.
--Nasser


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Nasser M. Abbasi

On 3/26/2016 1:11 AM, Radhakrishnan CV wrote:

On Sat, Mar 26, 2016 at 3:23 AM, Nasser M. Abbasi  wrote:

​[...]​

For example, for one file, using Vbox, it took 14 hrs

for make4ht to compile the file to html. On cygwin, it took
little less than than. About 10 hrs. This is on windows 7, 64 bit
16 GB ram, fast intel i7-3930k CPU.



​That is terrible! But, it contradicts with my own experience. At work, we
do large documents (on an average 300 pages long, 800-1000 bibliographic
items, 500 to 800 equations, very complex math, large number of figures,
double column output) on a daily basis, but it takes a few seconds to
generate Elsevier XML output. Recently, another article with 350 pages, ~70
figures, four or five very long tables each spanning several pages, 350 bib
items, several hundred cross references, but very few math, took only 12
secs for three runs of TeX4ht to generate NLM XML output on a server where
at least 50 users are working simultaneously using same resources. The only
documents that take, say, 60 secs or a bit more time are documents with
atomic and nuclear data tables, each table running to 200 pages typically!
Otherwise, tex4ht run is a breeze in my experience that too on a server
shared by at least forty to fifty users at a time.

[...]
   ​



But the issue is, pdflatex and lualatex take about 5 minutes
on the same file to compile it to pdf !

I can understand converting to HTML will take more time,
since each equation is converted to svg image,



​on the fly? Why don't you write out the math in a file and process
separately to generate the svg images in one go?​



Sorry, I do not understand what this mean. I have latex
file, which contains math, and then call tex4ht to
generate the HTML. I use make4ht to compile it and tell
it to use svg for math.


[...]



It also seems tex4ht has more than one pass. As I see it
generating these sequence of numbers  more than one time.



​tex4ht needs three passes for fixing cross links and multicolumns in
tables.
​


Ok. But each pass is slow., as is seems to go through
the whole pages over and over again.







I can make a zip file with typical large latex file
with all the images it uses and my .cfg and main.mk4
and the command I used to compile the latex file if
any one wants to confirm this problem. Would this be ok?



​I would love to debug your problem. Please do send me. If it is too large
the archive, kindly put at some location and provide me the URL.
​


Thanks for the offer to look into it. I put the latex file
and all the needed include files I use and the .cfg and main.mk
and the command in one zip file. Here it is, in this folder:

http://12000.org/tmp/032616/

There is file call compile.sh which has this line:

make4ht --lua -u -c ./nma.cfg -e ./main.mk4 report.tex "htm,3,pic-align,notoc*"

THe report.tex there is 17 MB large.  You'll see the slow down
as it pages get to over [1000]... etc.. it will take few hours
to compile.

Please let me know if you need anything else or anything
I can try on my end. I made sure all the file needed there.
If I missed something, will update.




​[...]
​


Finally, is there a document that describes the passes/process
that tex4ht uses to compile to HTML at some high level? Like block
diagram, or such. I am not able to find such design document.



​A schematic diagram of a tex4ht run namely tex4ht.pdf is attached to this
mail. Hope this might help.​



Thanks for the diagram. But there should really be a more
detailed design document for tex4ht. For something as
important as tex4ht.


​Best regards​



thank you,
--Nasser



Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-25 Thread Reinhard Kotucha
On 2016-03-25 at 16:53:40 -0500, Nasser M. Abbasi wrote:

 > I have lots of large latex files, with lots of pages in
 > them with large number of equations, generated by
 > computer algebra systems. I also have lots of includegraphics
 > in these files for svg images.
 > 
 > I noticed that tex4ht becomes very slow as number of pages
 > increase. This is becoming so bad, that I ended up
 > buying new PC and installing Linux on it just in the hope
 > it will speed things up (I was using Vbox on windows,
 > then I tried cygwin on windows).

On Linux you could run tex4ht within strace(1).

strace is a program which traces system calls and its output can give
you a hint _where_ tex4ht spends so much time.

Please consult the manual page.

>From https://sourceforge.net/projects/strace :

 | strace is a diagnostic, debugging and instructional userspace
 | tracer for Linux. It is used to monitor interactions between
 | userspace processes and the Linux kernel, which include system
 | calls, signal deliveries, and changes of process state. The
 | operation of strace is made possible by the kernel feature known as
 | ptrace.

Regards,
  Reinhard

-- 
--
Reinhard KotuchaPhone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannovermailto:reinhard.kotu...@web.de
--