As was previously mentioned, I am adding the semantic and logical
structuring into poppler core.

My plan is to figure out what fits into which category by post processing
the XML. Any suggestions on how to reverse [or post?!] engineer this XML
back into the PDF would be appreciated.

In a few days I will have a very accurate XML genereated with
<header></header>, <footer></footer> and table of contents tags.

This will involve the "pushing" of the actual "printed" page numbers, and
adding hyperlink to each ToC entry, and partitioning the page structure as
far as the 1.3 standard allows.

My code is extremely modular, neat & efficient, and included the writing of
an OO API. So it should be easily extendable with author, title, publisher,
year and section title extraction capabilities.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to