[poppler] Extract title from pdf file.

Alec Taylor Wed, 09 Nov 2011 07:02:31 -0800

2011/11/10 Leonard Rosenthol <[email protected]>:
> On 11/9/11 1:26 AM, "Alec Taylor" <[email protected]> wrote:
>
>>The easiest way I can think of is to grab it from the headers and footers.
>>
>>I am about to submit a patch (any day now) which separate the header
>>and footers into separate tags from which you can access from
>>pdftohtml -xml.
>
> Are you also submitting patches to read & process any tags & structure in
> the PDF?  If the PDF is already tagged, then it will have any
> headers/footers already identified accordingly.  You should be using this
> when present.


Yes, I am using the RapidXML library, which I specifically chose for
speed and that it is header only.

The patch will literally be submitted in the next 3 days, if not earlier.

>
>>I will then work on incorporating it all back into the PDF, with ToC
>>linkage (I will make a new pdftopdf utility).
>
> So are you also writing the structure back into the PDF?

That's the plan, however that may take a while longer, I'll need to
checkup on what kind of helpers the current API provides. For
instance, I haven't had a chance to read the poppler-toc.cc file.
Would that be helpful?

> Leonard
>
>
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

[poppler] Extract title from pdf file.

Reply via email to