Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Leonard Rosenthol Tue, 15 Nov 2011 12:29:38 -0800

For proper structure, you are going to need to find a way to match the
structure information with the elements in the content stream and then
somehow modify the stream accordingly (and add the relevant dictionaries,
etc.)


On 11/15/11 12:23 PM, "Josh Richardson" <[email protected]> wrote:

>Someone on the list may have a better idea, but I would almost certainly
>start with the PDFDoc created by reading the original document, and inject
>back in the meta-data that you have collected -- I believe this was
>Leonard's recommendation as well.
>
>--josh
>
>On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote:
>
>>Good afternoon,
>>
>>How would I go about reverse-engineering an XML file generated by
>>pdftohtml -xml bak into the [same] PDF?
>>
>>I have been spending a long time extending the XML output to include
>>proper page numbers and header/footer detection.
>>
>>It would be extremely useful if I could push the additional logical
>>structure information and page numbers back into the PDF the XML was
>>generated from.
>>
>>How would I go about doing this?
>>
>>Thanks for all suggestions,
>>
>>Alec Taylor
>>
>>PS: T-9 days (or less!) until PATCH :)
>>_______________________________________________
>>poppler mailing list
>>[email protected]
>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>
>_______________________________________________
>poppler mailing list
>[email protected]
>http://lists.freedesktop.org/mailman/listinfo/poppler

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Reply via email to