In a project I'm working on I have a picture as a data stream, which I hand over to tesseract after some pre-processing. Now the resulting text, after calling TesseractRect, were sometimes a bit irritating. So I saved the data stream to a file and ran the tesseract binary with it. Surprisingly here the results were satisfying.
I then wrote a small test application which compares those two cases. I load the pre-processed image twice, once with the external library and once with tesseract itself. The results from the latter case are again satisfying while loading the image externally obviously messes something up. Now I think I narrowed the problem down to either the external library loading the image beforehand or the capture function in tesseract which I use to hand over the buffer to tesseract. As external library I use FreeImage, and I never had problems with it. Also loading and saving does not mess up the image. It's a TIFF image, so there is no loss of information while loading/saving. So this leaves the capture function as probable point of error. Please notice, that not every image gets messed up, while using capture. Some TIFF images work fine. This is the output of my test application: Passing buffer with capture: > Test blob assigned to no row on pass 0 > Test blob y=(0,51), row=(10.053820,37.838539), overlap=27.784719 > Test blob assigned to row at (9.87488,38.0728) on pass 4 > Test blob y=(0,51), row=(9.847222,38.485786), overlap=28.638565 > Test blob assigned to row at (9.84722,38.4858) on pass 1 > Result: \'S\m\\*a.\x\$\\‘é>'&.\ Loading with read: > Image has 8 bits per pixel and size (371,52) > Result: Stone (40,000 B.C.) There seems to be some problems when using capture with this image, as this output indicates. I use tesseract v.2.03 from the hardy repository of Ubuntu, but building it myself does not change anything. It would be cool if someone could take the time and help me out with this problem. I hope I gave enough information, if not please ask. As I don't know what tesseract's output in the testapplication means it's probably best to start there. Thanks. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

