For proper structure, you are going to need to find a way to match the structure information with the elements in the content stream and then somehow modify the stream accordingly (and add the relevant dictionaries, etc.)
On 11/15/11 12:23 PM, "Josh Richardson" <[email protected]> wrote: >Someone on the list may have a better idea, but I would almost certainly >start with the PDFDoc created by reading the original document, and inject >back in the meta-data that you have collected -- I believe this was >Leonard's recommendation as well. > >--josh > >On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote: > >>Good afternoon, >> >>How would I go about reverse-engineering an XML file generated by >>pdftohtml -xml bak into the [same] PDF? >> >>I have been spending a long time extending the XML output to include >>proper page numbers and header/footer detection. >> >>It would be extremely useful if I could push the additional logical >>structure information and page numbers back into the PDF the XML was >>generated from. >> >>How would I go about doing this? >> >>Thanks for all suggestions, >> >>Alec Taylor >> >>PS: T-9 days (or less!) until PATCH :) >>_______________________________________________ >>poppler mailing list >>[email protected] >>http://lists.freedesktop.org/mailman/listinfo/poppler >> > >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
