Re: any way to turn a pdf file into something OCR-able?

2008-12-02 Thread Gary Kline
On Tue, Dec 02, 2008 at 02:22:27PM -0500, Chris Shenton wrote: > Gary Kline <[EMAIL PROTECTED]> writes: > > > pdftotext fail on the large [32MB] file I've got. Is there any other > > way I > > can translate this huge textfile to ascii or html or text? > > I wrote some code using Python

Re: any way to turn a pdf file into something OCR-able?

2008-12-02 Thread Gary Kline
On Tue, Dec 02, 2008 at 02:07:30AM +0100, Roland Smith wrote: > On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote: > > pdftotext fail on the large [32MB] file I've got. Is there any > > other way I can translate this huge textfile to ascii or html or > > text? > > Please defi

Re: any way to turn a pdf file into something OCR-able?

2008-12-02 Thread Roland Smith
On Mon, Dec 01, 2008 at 08:23:09PM -0500, Robert Huff wrote: > > Roland Smith writes: > > > > pdftotext fail on the large [32MB] file I've got. Is there any > > > other way I can translate this huge textfile to ascii or html or > > > text? > > > > > Please define "fail" in this context

Re: any way to turn a pdf file into something OCR-able?

2008-12-01 Thread Olivier Nicole
> > 1) Some PDFs are just wrappers around JPEG images. In this case > > there is no text for pdftotext to convert => epic fail. > > In this case "convert" from the ImageMagick port will get you a > series of .jpg/.gif/.. Read the manual carefully before > attempting; also note this can be

Re: any way to turn a pdf file into something OCR-able?

2008-12-01 Thread Robert Huff
Roland Smith writes: > >pdftotext fail on the large [32MB] file I've got. Is there any > >other way I can translate this huge textfile to ascii or html or > >text? > > Please define "fail" in this context? I've used pdftotxt on > documents exceeding 40MB. However there are of

Re: any way to turn a pdf file into something OCR-able?

2008-12-01 Thread Roland Smith
On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote: > pdftotext fail on the large [32MB] file I've got. Is there any > other way I can translate this huge textfile to ascii or html or > text? Please define "fail" in this context? I've used pdftotxt on documents exceeding