On 12/11/2023 9:06 PM, William F Hammond wrote:
Hello Nasser,
You don't give us much to go on. But it does provoke my curiosity.
Sorry, but I did send Michal detailed information on this.
I just added bug for tracking and did not think anyone else will
be interested in all the boring details of my build.
I assume that you are able to build the 57,000 page pdf from the tex source
that you want to process with tex4ht.
Oh, yes ofcourse. The file builds OK in lualatex. Here is the link
<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/report.htm>
THere are over 10,000 subsections,. and tex4ht breaks down on
reportsubsection1100
Which is this
<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/reportsubsection1100.htm#x1117-109610003.10.84>
If you click <NEXT> from the top of the above page you get error link not found
since no more subsections are processed after that. There is almost 9,000
subsections that should be there. All are not generated.
Is html output the final tex4ht target? I'm assuming it is.
Yes, only HTML (mathjax) mode.
You say:
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]
From this I deduce that the 57,000 page document is being written in HTML
pieces by tex4ht, "reportsubsection1100.htm" is one of those pieces, and
perhaps not all expected pieces have been generated.
Have you checked whether "reportsubsection1100.htm" is well-formed XML
using, say, the tool "xmlwf" found in the expat distribution?
I build that code in reportsubsection1100.htm on its own, and it builds OK
with make4ht. It is only when the code is part of report.tex (the full
latex file which includes everything) where this problem is found.
I do not know xmlwf. I just see these domfilter and XML messages show
up and that is all. I really know very little about these.
But to help Michal, I send him ZIP file with everything in it so he can
reproduce this on his computer also.
It seems related to use tables, since that is the place where it fails.
--Nasser
-- Bill
William F Hammond
Email: [email protected]
https://www.facebook.com/william.f.hammond
http://www.albany.edu/~hammond/
𝑻𝒉𝒆 𝒕𝒊𝒎𝒆 𝒕𝒐 𝒔𝒂𝒗𝒆 𝒂 𝒅𝒆𝒎𝒐𝒄𝒓𝒂𝒄𝒚 𝒊𝒔 𝒃𝒆𝒇𝒐𝒓𝒆 𝒊𝒕
𝒊𝒔 𝒍𝒐𝒔𝒕. -- 𝐊𝐞𝐧 𝐁𝐮𝐫𝐧𝐬
On Mon, Dec 11, 2023 at 5:04 PM Nasser M. Abbasi <[email protected]>
wrote:
URL:
<http://puszcza.gnu.org.ua/bugs/?618>
Summary: Incomplete XML Document, domfilter error,
truncated
build on large file.
Project: tex4ht
Submitted by: nma123
Submitted on: Tue Dec 12 01:04:12 2023
Category: None
Priority: 5 - Normal
Severity: 7 - Important
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
I have been working with Michal on this via private email but thought to
enter
a bug report on this just for tracking and documentation.
I have one large file (57,000 PDF pages) that when compiled with tex4ht
(takes
14 hrs), and at about 10% when generating the final HTML pages, it gets XML
error and stops.
i.e. the 90% rest of the sections are missing from the final web pages.
-------------------------------------------------------
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm
----------------------------------
I've just send Michal a link to complete self contained ZIP file (450 MB)
with
instructions how to run as standalone in order to see these errors on his
end.
I tried this on latest texlive 2023 on new Linux installation.
I will work with Michal to provide any additional information he needs from
me, to hopefully find the cause of this problem.
This happens only on this file. I think may be due to the large size, since
the Latex code is all generated by same program and only this file gives
this
error.
--Nasser
_______________________________________________________
Reply to this item at:
<http://puszcza.gnu.org.ua/bugs/?618>
_______________________________________________
Message sent via/by Puszcza
http://puszcza.gnu.org.ua/