Re: [Tutor] extracting informations (images and text) from a PDF andcreating a database from it

Didar Hossain Tue, 29 Dec 2009 22:11:52 -0800

On Tue, Dec 29, 2009 at 3:21 PM, Shashwat Anand
<[email protected]> wrote:
> I used PDFMiner and I was pretty satisfied with the text portions. I
> retrieved all the text and was able to manipulate it according to my wish.
> However I failed on Image part. So Technically my question reduces to 'If
> there  a PDF document and some verbose text below them and the pattern is
> followed i.e. per page of PDF there will be one image and some texts
> following it, how can I retrieve both the images and the text without loss'
> ?


You can use `pdftohtml' [http://pdftohtml.sf.net]. It is available on Ubuntu.

Regards,
Didar
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extracting informations (images and text) from a PDF andcreating a database from it

Reply via email to