Re: Improve results from attached image

Mike Mon, 14 Jan 2013 03:17:45 -0800

I just realized that the same image returns different results depending on 
the tesseract version. So with version 3.01 
(http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.01-win32-portable.zip&can=2&q=)
 
everything is recognized correctly whereas with version 3.02 it is not. So 
for this case changes made to the code have decreased accuracy.
Mike


On Friday, January 4, 2013 6:52:13 PM UTC+1, Mike wrote:
>
> Thanks for the info,
>
> oddly enough tesseract recognizes the 8 perfectly well for psm values 5 
> and 6. I could imagine there is more to this than just a bad source image. 
> The 8 must be interpreted in a different way for psm 9 and 10.
>
> Mike
>
> On Friday, January 4, 2013 6:07:22 PM UTC+1, sventech wrote:
>>
>> Tesseract does not work well for fewer than 4 chars, I think, and your 
>> image is very pixelated.
>> Sven
>>
>> On Friday, January 4, 2013, Mike wrote:
>>
>>> Hi, I am still facing an issue where the number 8 is not detected,
>>>
>>> Here is a way to reproduce the problem using binaries downloaded from 
>>> the tesseract site.
>>> I downloaded the tesseract portable (
>>> http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip&can=2&q=)
>>>  
>>> and ran following command line with the attached image to this post.
>>> tesseract.exe -l eng -psm 8 OCR_MONO_DEBUG.jpg test
>>> in test.txt i get following string  "/"
>>> I would expect "8", I would really appreciate it a lot if anyone can 
>>> verify this behaviour on their side.
>>>
>>> Thanks in advance,
>>> Mike
>>>
>>> On Thursday, September 13, 2012 12:19:13 PM UTC+2, Mike wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for the info. I am using revision 700, now I tried what 
>>>> "sventech" explained and it improved my results. I will integrate the 
>>>> latest revision and see if it then even gets better.
>>>>
>>>> On Wednesday, September 12, 2012 11:09:29 PM UTC+2, Stane wrote:
>>>>>
>>>>> Does the example images work with your code?
>>>>>
>>>>> If us the tesseract 3.02 api to detect your image(white 8 on black 
>>>>> ground), it get recognized without problems
>>>>> Iam using the default PageSegMode and OEM_TESSERACT_ONLY.
>>>>> Hope that helps somehow.
>>>>>
>>>>> On Monday, September 3, 2012 11:06:41 AM UTC+2, Mike wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> maybe someone can point me into the right direction.
>>>>>> I use Windows 7 32 bit.
>>>>>> When taking the attached image and loading it with tesseract.exe 
>>>>>> (3.01) via following command: tesseract.exe OCR_MONO_DEBUG.jpg test -l 
>>>>>> eng 
>>>>>> -psm 8
>>>>>> The result is correct.
>>>>>> However I use the following functions (where image is the attached 
>>>>>> file read internally by my program converted to 1 byte mono):
>>>>>>
>>>>>> pTessBase->SetPageSegMode(**tesseract::PSM_SINGLE_WORD);
>>>>>> pTessBase->SetImage(pImage, width, height, 1, width);
>>>>>> char* ocr_result = pTessBase->GetUTF8Text();
>>>>>>
>>>>>> Then oddly enough I do not get any results, all I get is an empty 
>>>>>> string. Setting whitelist to only numbers does not help either. When I 
>>>>>> have 
>>>>>> 2 numbers to recognize such as 81 then all works fine.
>>>>>>
>>>>>> Thanks in advance.
>>>>>> Mike
>>>>>>
>>>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>> -- 
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Improve results from attached image

Reply via email to