The best converter so far is pdftotext from http://www.glyphandcog.com/ who 
maintain an open source project at http://www.foolabs.com/xpdf/.

It's not a Python library but you can call pdftotext from with Python using 
os.system().  I used the pdftotext -layout option and that gave the best 
result.  hth.

dinesh


--------------------------------------------------------------------------------

Message: 4
Date: Tue, 21 Apr 2009 18:37:39 -0400
From: Robert Berman <[email protected]>
Subject: Re: [Tutor] PDF to text conversion
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

First, thanks to everyone who contributed to this thread. I have a 
number of possible solutions and a number of paths to pursue to 
determine which avenue I should take to resolve this remaining issue. I 
did try the itools library and while everything installed nicely, most 
of the tests failed so I am not particularly overjoyed with the results.

Thank you Dinesh for the vote of sympathy. I do appreciate it.

I did use Adobe Reader to convert the history PDF file into a text file 
and it did seem to do it faithfully. So now I will work out a parsing 
function to extract my data and send it to a SQLLITE database.

I am thrilled both with the number of suggestions I have received from 
this group and the quality of the suggestions.

Thanks again,

Robert Berman

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to