I've been fighting a problem that basically anything I put in a document,
howsoever tiny, causes TOC, \ref->\abel and \cite->\biblio to all break.
And by that I mean, figures/labels, listings, for sure.  But even just
index entries.

TL;DR I wrote a tool to analyze and fix broken links in the generated EPUB,
and found that almost all (that is to say, all but ONE -- nearly 500) were
fixable by doing some minor matching-up of busted hrefs with ids using
simple pattern-matching, as described below.  The fixed EPUB worked fine,
and I was able to verify that the fixed links were correct (hence, not just
stitching-together random hrefs->ids in the book).

===========================

I had a brain-fart last night, and wrote a tool to implement it.  I have
been noticing that the fragment-ids look like

  xNNNN-MMMMM

(where the "MMMMM can contain dots and maybe some letters, but rarely so).

And I noticed a few things:

(1) [multi-file problem] sometimes an href in file F1 looks like "#x1-32"
when there is no such id in file F1, but there IS such an href in file F2.
One might call this an unfortunate artifact of the generation of multiple
files: if only a single HTML file were generated, this wouldn't even be a
problem.

(2) [wrong prefix problem] sometimes {a real example) an href looks like
"#x7-15001", and there is no such ID anywhere in the files.  But there IS
an ID "#x8-15001".

(3) and sometimes, more than one ID in the files is identical, viz.

mainch1.html#x6-80001
mainch2.html#x8-80001

So I wrote a tool that found all the hrefs and IDs, and "did the math" that
is implied by the above thinking, and found that in my book there was:

* ONE instance of #3
* THREE instances of #1
* 446 instances of #2

I don't know Lua, and I sure don't know Tex (35years ago, I wrote my PhD in
Latex (of course), and for a few years after that, wrote papers in Latex,
but it's been easily 30yr since I wrote Latex).  [I'm just helping a friend
to translate his book]  But this seems ..... indicative of some sort of
pattern of a problem.

>From the story above, I hope it's obvious that one could take the
"suggestions" the tool emits, which I print as below, and build a little
mapping that rewrites those broken hrefs to working ones.  I implemented
this, and ..... haha, it works:

(1) epubcheck reports no broken references
(2) when I view the epub in ebook-viewer and check out a bunch of
references (to figures, to chapters, to index entries, to biblio entries),
they all work.

Reply via email to