2011/11/10 Leonard Rosenthol <[email protected]>: > On 11/9/11 1:26 AM, "Alec Taylor" <[email protected]> wrote: > >>The easiest way I can think of is to grab it from the headers and footers. >> >>I am about to submit a patch (any day now) which separate the header >>and footers into separate tags from which you can access from >>pdftohtml -xml. > > Are you also submitting patches to read & process any tags & structure in > the PDF? If the PDF is already tagged, then it will have any > headers/footers already identified accordingly. You should be using this > when present.
Yes, I am using the RapidXML library, which I specifically chose for speed and that it is header only. The patch will literally be submitted in the next 3 days, if not earlier. > >>I will then work on incorporating it all back into the PDF, with ToC >>linkage (I will make a new pdftopdf utility). > > So are you also writing the structure back into the PDF? That's the plan, however that may take a while longer, I'll need to checkup on what kind of helpers the current API provides. For instance, I haven't had a chance to read the poppler-toc.cc file. Would that be helpful? > Leonard > > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
