I just realized that the same image returns different results depending on the tesseract version. So with version 3.01 (http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.01-win32-portable.zip&can=2&q=) everything is recognized correctly whereas with version 3.02 it is not. So for this case changes made to the code have decreased accuracy. Mike
On Friday, January 4, 2013 6:52:13 PM UTC+1, Mike wrote: > > Thanks for the info, > > oddly enough tesseract recognizes the 8 perfectly well for psm values 5 > and 6. I could imagine there is more to this than just a bad source image. > The 8 must be interpreted in a different way for psm 9 and 10. > > Mike > > On Friday, January 4, 2013 6:07:22 PM UTC+1, sventech wrote: >> >> Tesseract does not work well for fewer than 4 chars, I think, and your >> image is very pixelated. >> Sven >> >> On Friday, January 4, 2013, Mike wrote: >> >>> Hi, I am still facing an issue where the number 8 is not detected, >>> >>> Here is a way to reproduce the problem using binaries downloaded from >>> the tesseract site. >>> I downloaded the tesseract portable ( >>> http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip&can=2&q=) >>> >>> and ran following command line with the attached image to this post. >>> tesseract.exe -l eng -psm 8 OCR_MONO_DEBUG.jpg test >>> in test.txt i get following string "/" >>> I would expect "8", I would really appreciate it a lot if anyone can >>> verify this behaviour on their side. >>> >>> Thanks in advance, >>> Mike >>> >>> On Thursday, September 13, 2012 12:19:13 PM UTC+2, Mike wrote: >>>> >>>> Hi, >>>> >>>> Thanks for the info. I am using revision 700, now I tried what >>>> "sventech" explained and it improved my results. I will integrate the >>>> latest revision and see if it then even gets better. >>>> >>>> On Wednesday, September 12, 2012 11:09:29 PM UTC+2, Stane wrote: >>>>> >>>>> Does the example images work with your code? >>>>> >>>>> If us the tesseract 3.02 api to detect your image(white 8 on black >>>>> ground), it get recognized without problems >>>>> Iam using the default PageSegMode and OEM_TESSERACT_ONLY. >>>>> Hope that helps somehow. >>>>> >>>>> On Monday, September 3, 2012 11:06:41 AM UTC+2, Mike wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> maybe someone can point me into the right direction. >>>>>> I use Windows 7 32 bit. >>>>>> When taking the attached image and loading it with tesseract.exe >>>>>> (3.01) via following command: tesseract.exe OCR_MONO_DEBUG.jpg test -l >>>>>> eng >>>>>> -psm 8 >>>>>> The result is correct. >>>>>> However I use the following functions (where image is the attached >>>>>> file read internally by my program converted to 1 byte mono): >>>>>> >>>>>> pTessBase->SetPageSegMode(**tesseract::PSM_SINGLE_WORD); >>>>>> pTessBase->SetImage(pImage, width, height, 1, width); >>>>>> char* ocr_result = pTessBase->GetUTF8Text(); >>>>>> >>>>>> Then oddly enough I do not get any results, all I get is an empty >>>>>> string. Setting whitelist to only numbers does not help either. When I >>>>>> have >>>>>> 2 numbers to recognize such as 81 then all works fine. >>>>>> >>>>>> Thanks in advance. >>>>>> Mike >>>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

