Re: Having traindata files uncombined

zdenko podobny Sun, 12 Aug 2012 04:07:40 -0700

please post details (OS, tesseract version, exact error message...)

-- 
Zdenko


On Sun, Aug 12, 2012 at 7:32 AM, Chathuri Gunawardhana <
[email protected]> wrote:

> I followed
> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
>  .But
> I'm  getting error could not open user-data. User data file is actually in
> correct location. But it says that file is not there. Any suggestions?
>
> Thanks!
>
> On Sat, Aug 11, 2012 at 6:48 PM, Chathuri Gunawardhana <
> [email protected]> wrote:
>
>>
>>
>> ---------- Forwarded message ----------
>> From: zdenko podobny <[email protected]>
>> Date: Sat, Aug 11, 2012 at 6:38 PM
>> Subject: Re: Having traindata files uncombined
>> To: [email protected]
>>
>>
>> Yeah - it is much better ;-)
>> Unfortunately at the moment I do not have time for deep testing so here
>> are my suggestions:
>>
>>    - if you are using tesseract via api, try to set rectangles (instead
>>    of whole image) with coords of city names to avoid "noise" (e.g. contours)
>>    from map. tesseract is "noise sensitive" and noise can decrease ocr 
>> quality
>>    - if you are using tesseract executable try to extract city names to
>>    individual images
>>    - after this you can start to play with dictionaries ;-)
>>    - you can use user_words "outside" of traineddata file see [1]
>>    - try to play with page segmentation parameter (psm)
>>    - I am not aware how to increase (or decrease) strength of
>>    dictionaries in tesseract 3.02 (e.g. to force tesseract to output only
>>    words from dictionaries...)
>>
>> I believe after this you can at least evaluate if tesseract is suitable
>> for your task...
>>
>> [1]
>> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
>>
>> --
>> Zdenko
>>
>> On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana <
>> [email protected]> wrote:
>>
>>> actually you can use this image under
>>> http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high
>>> quality than above.
>>>
>>>
>>> On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]>wrote:
>>>
>>>>
>>>> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana <
>>>> [email protected]> wrote:
>>>>
>>>>> Image that I'm trying to identify is attached. Most words in here are
>>>>> not identified correctly. I added these words to user words and combined.
>>>>> But still didn't get the expected output.
>>>>>
>>>>>
>>>> your attached image has insufficient quality - I get no output for it...
>>>>
>>>> --
>>>> Zdenko
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>>
>>
>> --
>> Chathuri Gunawardhana
>> Undergraduate at University of Moratuwa
>> Sri Lanka
>>
>
>
>
> --
> Chathuri Gunawardhana
> Undergraduate at University of Moratuwa
> Sri Lanka
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Having traindata files uncombined

Reply via email to