In a project I'm working on I have a picture as a data stream, which I
hand over to tesseract after some pre-processing. Now the resulting
text, after calling TesseractRect, were sometimes a bit irritating. So
I saved the data stream to a file and ran the tesseract binary with
it. Surprisingly here the results were satisfying.

I then wrote a small test application which compares those two cases.
I load the pre-processed image twice, once with the external library
and once with tesseract itself. The results from the latter case are
again satisfying while loading the image externally obviously messes
something up.

Now I think I narrowed the problem down to either the external library
loading the image beforehand or the capture function in tesseract
which I use to hand over the buffer to tesseract. As external library
I use FreeImage, and I never had problems with it. Also loading and
saving does not mess up the image. It's a TIFF image, so there is no
loss of information while loading/saving. So this leaves the capture
function as probable point of error.

Please notice, that not every image gets messed up, while using
capture. Some TIFF images work fine.

This is the output of my test application:

Passing buffer with capture:
> Test blob assigned to no row on pass 0
> Test blob y=(0,51), row=(10.053820,37.838539), overlap=27.784719
> Test blob assigned to row at (9.87488,38.0728) on pass 4
> Test blob y=(0,51), row=(9.847222,38.485786), overlap=28.638565
> Test blob assigned to row at (9.84722,38.4858) on pass 1
> Result: \'S\m\\*a.\x\$\\‘é>'&.\

Loading with read:
> Image has 8 bits per pixel and size (371,52)
> Result: Stone (40,000 B.C.)

There seems to be some problems when using capture with this image, as
this output indicates.

I use tesseract v.2.03 from the hardy repository of Ubuntu, but
building it myself does not change anything.

It would be cool if someone could take the time and help me out with
this problem. I hope I gave enough information, if not please ask. As
I don't know what tesseract's output in the testapplication means it's
probably best to start there. Thanks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to