Re: [Tutor] extracting informations (images and text) from a PDF andcreating a database from it

Alan Gauld Tue, 29 Dec 2009 01:31:47 -0800

"Shashwat Anand" <[email protected]> wrote

I need to make a database from some PDFs. I need to extract logos as wellas

the information (i.e. name,address) beneath the logo and fill it up in
database. The logo can be text as well as picture as shown in two of the
screenshots of one of the sample PDF file:
http://imagebin.org/77378
http://imagebin.org/77379


You could try PDFMiner to extract direct from the PDF using Python.

Will converting to html a good option? Later on I need to apply someimage
processing too. What should be the ideal way towards it ?


Converting to html (assuming you have a tool to do that!) may be better
since there are a wider choice of tools and more experience to help you.
Or there are various commercial tools for converting PDF into Word etc.

I've never personally had to extract data from a PDF, I've always hadaccess

to the source documents so I can't comment on how effective each approach
is...

--
Alan Gauld
Author of the Learn to Program web site

http://www.alan-g.me.uk/


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extracting informations (images and text) from a PDF andcreating a database from it

Reply via email to