I followed http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data .But I'm getting error could not open user-data. User data file is actually in correct location. But it says that file is not there. Any suggestions?
Thanks! On Sat, Aug 11, 2012 at 6:48 PM, Chathuri Gunawardhana < [email protected]> wrote: > > > ---------- Forwarded message ---------- > From: zdenko podobny <[email protected]> > Date: Sat, Aug 11, 2012 at 6:38 PM > Subject: Re: Having traindata files uncombined > To: [email protected] > > > Yeah - it is much better ;-) > Unfortunately at the moment I do not have time for deep testing so here > are my suggestions: > > - if you are using tesseract via api, try to set rectangles (instead > of whole image) with coords of city names to avoid "noise" (e.g. contours) > from map. tesseract is "noise sensitive" and noise can decrease ocr quality > - if you are using tesseract executable try to extract city names to > individual images > - after this you can start to play with dictionaries ;-) > - you can use user_words "outside" of traineddata file see [1] > - try to play with page segmentation parameter (psm) > - I am not aware how to increase (or decrease) strength of > dictionaries in tesseract 3.02 (e.g. to force tesseract to output only > words from dictionaries...) > > I believe after this you can at least evaluate if tesseract is suitable > for your task... > > [1] > http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data > > -- > Zdenko > > On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana < > [email protected]> wrote: > >> actually you can use this image under >> http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high >> quality than above. >> >> >> On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]> wrote: >> >>> >>> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < >>> [email protected]> wrote: >>> >>>> Image that I'm trying to identify is attached. Most words in here are >>>> not identified correctly. I added these words to user words and combined. >>>> But still didn't get the expected output. >>>> >>>> >>> your attached image has insufficient quality - I get no output for it... >>> >>> -- >>> Zdenko >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > > > -- > Chathuri Gunawardhana > Undergraduate at University of Moratuwa > Sri Lanka > -- Chathuri Gunawardhana Undergraduate at University of Moratuwa Sri Lanka -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

