On Sun, Apr 11, 2010 at 3:42 AM, Brad Hards <[email protected]> wrote: > On Thursday 08 April 2010 08:35:15 pm Mathieu Malaterre wrote: >> This is slightly of topic to poppler. I am looking for a way to read >> the Meta Information of a PDF file (basically the output of pdfinfo). > This isn't a lot of context to work with, so I'm guessing what might work for > you. >> I find it a little bit cumbersome to integrate poppler (license issue, >> no real need for a full rendering PDF library). Could someone suggest >> another solution for reading those Meta Information from PDF files ? > If you don't want to use poppler / pdfinfo, you could buy the adobe libraries, > or you could try pdftk. Podofo may also be a possibility.
I should have mention this is for integration in an open source/ cross platform toolkit with BSD license. For now I use tricks to link to private header of -system installed- poppler (due to API changes). But I still lack a PDF parser for Win32 platforms. >> Will a simple regex (such as: "<rdf:RDF.*</rdf:RDF>)") works ? > I do not think this will work in general. It might work for all the PDF files > you care about though. Read the PDF specification (Section 10.2.2 or > thereabouts) for information on the metadata stream(s). If I find some time, I might get started with this python parser I found on the net: http://blog.didierstevens.com/programs/pdf-tools/ It is self contained, and is exactly focus on what I am looking for a stream interface (what is SAX to XML people) for PDF people. Thanks anyway, -- Mathieu _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
