Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Josh Richardson Tue, 15 Nov 2011 12:24:46 -0800

Someone on the list may have a better idea, but I would almost certainly
start with the PDFDoc created by reading the original document, and inject
back in the meta-data that you have collected -- I believe this was
Leonard's recommendation as well.


--josh

On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote:

>Good afternoon,
>
>How would I go about reverse-engineering an XML file generated by
>pdftohtml -xml bak into the [same] PDF?
>
>I have been spending a long time extending the XML output to include
>proper page numbers and header/footer detection.
>
>It would be extremely useful if I could push the additional logical
>structure information and page numbers back into the PDF the XML was
>generated from.
>
>How would I go about doing this?
>
>Thanks for all suggestions,
>
>Alec Taylor
>
>PS: T-9 days (or less!) until PATCH :)
>_______________________________________________
>poppler mailing list
>[email protected]
>http://lists.freedesktop.org/mailman/listinfo/poppler
>

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Reply via email to