Re: Having traindata files uncombined

Chathuri Gunawardhana Sun, 12 Aug 2012 03:35:36 -0700

I followed
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
.But
I'm  getting error could not open user-data. User data file is actually in
correct location. But it says that file is not there. Any suggestions?


Thanks!

On Sat, Aug 11, 2012 at 6:48 PM, Chathuri Gunawardhana <
[email protected]> wrote:

>
>
> ---------- Forwarded message ----------
> From: zdenko podobny <[email protected]>
> Date: Sat, Aug 11, 2012 at 6:38 PM
> Subject: Re: Having traindata files uncombined
> To: [email protected]
>
>
> Yeah - it is much better ;-)
> Unfortunately at the moment I do not have time for deep testing so here
> are my suggestions:
>
>    - if you are using tesseract via api, try to set rectangles (instead
>    of whole image) with coords of city names to avoid "noise" (e.g. contours)
>    from map. tesseract is "noise sensitive" and noise can decrease ocr quality
>    - if you are using tesseract executable try to extract city names to
>    individual images
>    - after this you can start to play with dictionaries ;-)
>    - you can use user_words "outside" of traineddata file see [1]
>    - try to play with page segmentation parameter (psm)
>    - I am not aware how to increase (or decrease) strength of
>    dictionaries in tesseract 3.02 (e.g. to force tesseract to output only
>    words from dictionaries...)
>
> I believe after this you can at least evaluate if tesseract is suitable
> for your task...
>
> [1]
> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
>
> --
> Zdenko
>
> On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana <
> [email protected]> wrote:
>
>> actually you can use this image under
>> http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high
>> quality than above.
>>
>>
>> On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]> wrote:
>>
>>>
>>> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana <
>>> [email protected]> wrote:
>>>
>>>> Image that I'm trying to identify is attached. Most words in here are
>>>> not identified correctly. I added these words to user words and combined.
>>>> But still didn't get the expected output.
>>>>
>>>>
>>> your attached image has insufficient quality - I get no output for it...
>>>
>>> --
>>> Zdenko
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
>
>
> --
> Chathuri Gunawardhana
> Undergraduate at University of Moratuwa
> Sri Lanka
>



-- 
Chathuri Gunawardhana
Undergraduate at University of Moratuwa
Sri Lanka

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Having traindata files uncombined

Reply via email to