On Thu, 17 Jul 2008, Antoine Jacoutot wrote:
> On Wed, 16 Jul 2008, jared r r spiegel wrote:
>
> > cranky old OCR engine that apparently sucks less than most
> > other ones out there ??. friend of mine asked for it in response
> > to seeing something on groklaw where they used it with image-based PDFs
> > and xpdf or something to snarf the text out of them
> >
> > without the stuff in ${SUPDISTFILES}, the user has to train the
> > OCR engine, which is reasonably documented on their wiki but
> > also looks laborious and annoying if you don't otherwise need
> > that level of accuracy, hence grabbing the SUPDISTFILE stuff.
> >
> > apache license 2.0
>
>
> I'll take care of that.
> Thanks for your submission.Ok... I finally got some time to look into this. I reworked your port so that language files are in corresponding subpackages. I also tweaked the patches a bit, changed DESCR, provide doc and samples... I assume you still want to maintain this, right? Anyway, it seems to work fine on i386. (for those who don't have a scanner, two sample file are provided for testing). Comments/OK? -- Antoine
tesseract.tar.gz
Description: Binary data
