On 24 September 2010 18:37, Kevin Carlson <[email protected]> wrote: > We receive PDF files which appear to contain scanning artifacts which > severely impact recognition. Specifically, under magnification you > can see regularly spaced "notches" and corresponding "bumps", > especially noticeable with vertical lines. > > Currently I'm using Ghostscript to convert the files to TIFF for > processing, any Python-based alternatives out there? Ultimately would > like to do all cleaning and converting using Python, with "Pytesser" > to do the OCR. >
Unlikely. Ghostscript isn't designed to work as a library, so there's nothing to write a Python wrapper around. Postscript is a whole programming language -- I find it hard to imagine that someone would be masochistic enough to write anything more than a toy implementation in a slow, memory hungry language like Python. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

