please post details (OS, tesseract version, exact error message...) -- Zdenko
On Sun, Aug 12, 2012 at 7:32 AM, Chathuri Gunawardhana < [email protected]> wrote: > I followed > http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data > .But > I'm getting error could not open user-data. User data file is actually in > correct location. But it says that file is not there. Any suggestions? > > Thanks! > > On Sat, Aug 11, 2012 at 6:48 PM, Chathuri Gunawardhana < > [email protected]> wrote: > >> >> >> ---------- Forwarded message ---------- >> From: zdenko podobny <[email protected]> >> Date: Sat, Aug 11, 2012 at 6:38 PM >> Subject: Re: Having traindata files uncombined >> To: [email protected] >> >> >> Yeah - it is much better ;-) >> Unfortunately at the moment I do not have time for deep testing so here >> are my suggestions: >> >> - if you are using tesseract via api, try to set rectangles (instead >> of whole image) with coords of city names to avoid "noise" (e.g. contours) >> from map. tesseract is "noise sensitive" and noise can decrease ocr >> quality >> - if you are using tesseract executable try to extract city names to >> individual images >> - after this you can start to play with dictionaries ;-) >> - you can use user_words "outside" of traineddata file see [1] >> - try to play with page segmentation parameter (psm) >> - I am not aware how to increase (or decrease) strength of >> dictionaries in tesseract 3.02 (e.g. to force tesseract to output only >> words from dictionaries...) >> >> I believe after this you can at least evaluate if tesseract is suitable >> for your task... >> >> [1] >> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data >> >> -- >> Zdenko >> >> On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana < >> [email protected]> wrote: >> >>> actually you can use this image under >>> http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high >>> quality than above. >>> >>> >>> On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]>wrote: >>> >>>> >>>> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < >>>> [email protected]> wrote: >>>> >>>>> Image that I'm trying to identify is attached. Most words in here are >>>>> not identified correctly. I added these words to user words and combined. >>>>> But still didn't get the expected output. >>>>> >>>>> >>>> your attached image has insufficient quality - I get no output for it... >>>> >>>> -- >>>> Zdenko >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>> >>> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >> >> -- >> Chathuri Gunawardhana >> Undergraduate at University of Moratuwa >> Sri Lanka >> > > > > -- > Chathuri Gunawardhana > Undergraduate at University of Moratuwa > Sri Lanka > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

