Thanks for the info, oddly enough tesseract recognizes the 8 perfectly well for psm values 5 and 6. I could imagine there is more to this than just a bad source image. The 8 must be interpreted in a different way for psm 9 and 10.
Mike On Friday, January 4, 2013 6:07:22 PM UTC+1, sventech wrote: > > Tesseract does not work well for fewer than 4 chars, I think, and your > image is very pixelated. > Sven > > On Friday, January 4, 2013, Mike wrote: > >> Hi, I am still facing an issue where the number 8 is not detected, >> >> Here is a way to reproduce the problem using binaries downloaded from the >> tesseract site. >> I downloaded the tesseract portable ( >> http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip&can=2&q=) >> >> and ran following command line with the attached image to this post. >> tesseract.exe -l eng -psm 8 OCR_MONO_DEBUG.jpg test >> in test.txt i get following string "/" >> I would expect "8", I would really appreciate it a lot if anyone can >> verify this behaviour on their side. >> >> Thanks in advance, >> Mike >> >> On Thursday, September 13, 2012 12:19:13 PM UTC+2, Mike wrote: >>> >>> Hi, >>> >>> Thanks for the info. I am using revision 700, now I tried what >>> "sventech" explained and it improved my results. I will integrate the >>> latest revision and see if it then even gets better. >>> >>> On Wednesday, September 12, 2012 11:09:29 PM UTC+2, Stane wrote: >>>> >>>> Does the example images work with your code? >>>> >>>> If us the tesseract 3.02 api to detect your image(white 8 on black >>>> ground), it get recognized without problems >>>> Iam using the default PageSegMode and OEM_TESSERACT_ONLY. >>>> Hope that helps somehow. >>>> >>>> On Monday, September 3, 2012 11:06:41 AM UTC+2, Mike wrote: >>>>> >>>>> Hi, >>>>> >>>>> maybe someone can point me into the right direction. >>>>> I use Windows 7 32 bit. >>>>> When taking the attached image and loading it with tesseract.exe >>>>> (3.01) via following command: tesseract.exe OCR_MONO_DEBUG.jpg test -l >>>>> eng >>>>> -psm 8 >>>>> The result is correct. >>>>> However I use the following functions (where image is the attached >>>>> file read internally by my program converted to 1 byte mono): >>>>> >>>>> pTessBase->SetPageSegMode(**tesseract::PSM_SINGLE_WORD); >>>>> pTessBase->SetImage(pImage, width, height, 1, width); >>>>> char* ocr_result = pTessBase->GetUTF8Text(); >>>>> >>>>> Then oddly enough I do not get any results, all I get is an empty >>>>> string. Setting whitelist to only numbers does not help either. When I >>>>> have >>>>> 2 numbers to recognize such as 81 then all works fine. >>>>> >>>>> Thanks in advance. >>>>> Mike >>>>> >>>> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

