Hi! We use some approach based on character properties to extract meaningful title from document text. Metadata usualy stores filename in title field.
-- Peter On Wednesday 09 November 2011 16:16:14 Alec Taylor wrote: > On Wed, Nov 9, 2011 at 10:37 PM, Albert Astals Cid <[email protected]> wrote: > > A Dimecres, 9 de novembre de 2011, Alec Taylor vàreu escriure: > >> Incorrect, all getDocInfo tells you is what the meta info says, it > >> doesn't analyse the actual document, whereas my pdftopdf will update > >> the metadata with the appropriate info after PDF analysis > > > > Please do not top post, makes reading e-mail incredibly hard. > > > > And no it is not incorrect, if the metadata does not have a title, then > > the document does not have a title as defined per the spec. > > > > Albert > > But maybe the document doesn't have a title, because it was grabbed > from scanning the book, then OCRing it. So what I will facilitate is > the generation of proper metadata (+ more) from a current PDF lacking > such. > > So if the document does have a title, my pdftopdf tool will find it, > and add it to the metadata. > > I will contribute pdftopdf to poppler. > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
