Thanks for the info,

oddly enough tesseract recognizes the 8 perfectly well for psm values 5 and 
6. I could imagine there is more to this than just a bad source image. The 
8 must be interpreted in a different way for psm 9 and 10.

Mike

On Friday, January 4, 2013 6:07:22 PM UTC+1, sventech wrote:
>
> Tesseract does not work well for fewer than 4 chars, I think, and your 
> image is very pixelated.
> Sven
>
> On Friday, January 4, 2013, Mike wrote:
>
>> Hi, I am still facing an issue where the number 8 is not detected,
>>
>> Here is a way to reproduce the problem using binaries downloaded from the 
>> tesseract site.
>> I downloaded the tesseract portable (
>> http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip&can=2&q=)
>>  
>> and ran following command line with the attached image to this post.
>> tesseract.exe -l eng -psm 8 OCR_MONO_DEBUG.jpg test
>> in test.txt i get following string  "/"
>> I would expect "8", I would really appreciate it a lot if anyone can 
>> verify this behaviour on their side.
>>
>> Thanks in advance,
>> Mike
>>
>> On Thursday, September 13, 2012 12:19:13 PM UTC+2, Mike wrote:
>>>
>>> Hi,
>>>
>>> Thanks for the info. I am using revision 700, now I tried what 
>>> "sventech" explained and it improved my results. I will integrate the 
>>> latest revision and see if it then even gets better.
>>>
>>> On Wednesday, September 12, 2012 11:09:29 PM UTC+2, Stane wrote:
>>>>
>>>> Does the example images work with your code?
>>>>
>>>> If us the tesseract 3.02 api to detect your image(white 8 on black 
>>>> ground), it get recognized without problems
>>>> Iam using the default PageSegMode and OEM_TESSERACT_ONLY.
>>>> Hope that helps somehow.
>>>>
>>>> On Monday, September 3, 2012 11:06:41 AM UTC+2, Mike wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> maybe someone can point me into the right direction.
>>>>> I use Windows 7 32 bit.
>>>>> When taking the attached image and loading it with tesseract.exe 
>>>>> (3.01) via following command: tesseract.exe OCR_MONO_DEBUG.jpg test -l 
>>>>> eng 
>>>>> -psm 8
>>>>> The result is correct.
>>>>> However I use the following functions (where image is the attached 
>>>>> file read internally by my program converted to 1 byte mono):
>>>>>
>>>>> pTessBase->SetPageSegMode(**tesseract::PSM_SINGLE_WORD);
>>>>> pTessBase->SetImage(pImage, width, height, 1, width);
>>>>> char* ocr_result = pTessBase->GetUTF8Text();
>>>>>
>>>>> Then oddly enough I do not get any results, all I get is an empty 
>>>>> string. Setting whitelist to only numbers does not help either. When I 
>>>>> have 
>>>>> 2 numbers to recognize such as 81 then all works fine.
>>>>>
>>>>> Thanks in advance.
>>>>> Mike
>>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
> -- 
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to