Someone on the list may have a better idea, but I would almost certainly start with the PDFDoc created by reading the original document, and inject back in the meta-data that you have collected -- I believe this was Leonard's recommendation as well.
--josh On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote: >Good afternoon, > >How would I go about reverse-engineering an XML file generated by >pdftohtml -xml bak into the [same] PDF? > >I have been spending a long time extending the XML output to include >proper page numbers and header/footer detection. > >It would be extremely useful if I could push the additional logical >structure information and page numbers back into the PDF the XML was >generated from. > >How would I go about doing this? > >Thanks for all suggestions, > >Alec Taylor > >PS: T-9 days (or less!) until PATCH :) >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
