On Friday 23 December 2011 05:11 AM, Herbert Sitz wrote:
Susan Jolly<easjolly@...>  writes:
This poster's question is significant for accessibility.  Braille, large
print, speech, and other accessible versions of print editions typically use
the page numbers of the (base) print (paged media) edition to allow users of
accessible documents to communicate with each other and with users of the
print edition.  While I appreciate that the concept of "page number" is
somewhat meaningless when using eReaders, the accessibility community has
not AFAIK addressed alternative solutions. So at least in the forseeable
future this is a capability that accessible media producers need.

Susan --

Good point, which I hadn't though of.  My own query is driven by a slightly
different but related need:  an academic setting where students may be using
ebook, html, and/or pdf versions.  Without having some kind of location-based
counter common to the text of all versions there's no good way for users of
different versions to refer to reference location of a particular passage.

In an online world, the concept and usefulness of pages disappear. TeX4ht works on this basic premises. Also, we have different devices with different geometry which make pages literally useless when html or XML based markup allows re-flowing of text unlike the rigid PDF. So, instead of making PDF with rigid margins as the definitive version, we have to return to the wisdom of our forefathers who created Bible and formatted it for different geometries, still retaining the ability to refer to any verse, chapter, line, etc., across all different versions in a uniform way.

So, the best option is to keep paragraph numbers instead of page numbers as the basis for reference.

The counter doesn't need to be the pdf page number, but that's an
already-existing counter that makes sense.  Whatever counter is used, it must be
present in all versions.

In looking further at tex4ht I'm not sure merely having ability to insert a
counter at page breaks would solve this problem.  I have .tex files that I
process to PDF using pdflatex, and which I process with tex4ht's htlatex to get
the html.

TeX4ht's page break has no relation with that in the corresponding PDF. If you create an xhtml or XML with MathML, you get a dvi of many hundred pages for a single page document! Most of the pages will be having a single character text only. So, relying on TeX4ht pagecbreak does not take you anywhere.

The problem I see is that tex4ht alters the formatting in the process of
generating the html.  tex4ht first compiles the document to an intermediate dvi,
then uses that dvi to generate the html.

That is correct.

I had expected the pagination of the
dvi file to correspond to the pagination of the pdf generated by pdflatex.

Unfortunately, No. Formatting the document is not the objective, but translating from one markup to another markup is the objective where formatting of text to look exactly like in a printable version is hardly necessary to accomplish TeX4ht's objective.

Unfortunately, the pagination does not necessarily match.  I'm not sure what

It won't match at all.

formatting changes tex4ht makes as part of compiling to dvi (besides disabling

The dvi is a convenient file format for TeX4ht's post-processor to extract text and markup injected into the dvi as \special's. And it provides a nicer way to replace and/or manipulate any character in any manner with the help of Eitan's ingenious hypertext fonts.

header and footer, which would not necessarily affect pagination).  So merely
having ability to hook in and put in a page counter for each new dvi page would
not necessarily give pagination markers that correspond to the PDF.

The entire page breaks and line breaks differ. All attributes like bold, italic, large, etc have lost their meaning found in pdf, but have a different meaning and different markup system palatable to the browser. When glue, vertical and horizontal skips, character widths/heights lose their meaning in TeX4ht generated dvi, it is clear that we will seldom get an output which visually corresponds to pdf output if printed.

I wonder whether there are some optional settings in tex4ht that would make the
dvi pagination match (or even closely match) the pagination in the PDF.

Sorry, I don't think that will happen.

I see a non-tex4ht-related way to generate the page numbers I want in the html,
but it's not trivial.  Basically, the steps would be these:

Sure, that is the best option. I too am interested to see a better solution.

[...]

Does anyone know whether there is a publicly available solution for this?  I
would probably write it in Python using the BeautifulSoup html api; I wonder
whether something like this is already available on github or elsewhere.

Or maybe there actually is some way to get tex4ht to (1) generate dvi with
pagination that corresponds to PDF pagination, and (2) include a page counter in
the html that corresponds to the PDF pages.

This is an impossibility in TeX4ht as far as I know.

Best regards
--
Radhakrishnan

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to