I'm runing tesseract .01. My os is windows 7.I added the files as you said. But when I run the command tesseract input output bazaar it says can't find the file eng.user-words. But the file is there.
Thanks! On Sun, Aug 12, 2012 at 4:37 PM, zdenko podobny <[email protected]> wrote: > please post details (OS, tesseract version, exact error message...) > > -- > Zdenko > > On Sun, Aug 12, 2012 at 7:32 AM, Chathuri Gunawardhana < > [email protected]> wrote: > >> I followed >> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data >> .But >> I'm getting error could not open user-data. User data file is actually in >> correct location. But it says that file is not there. Any suggestions? >> >> Thanks! >> >> On Sat, Aug 11, 2012 at 6:48 PM, Chathuri Gunawardhana < >> [email protected]> wrote: >> >>> >>> >>> ---------- Forwarded message ---------- >>> From: zdenko podobny <[email protected]> >>> Date: Sat, Aug 11, 2012 at 6:38 PM >>> Subject: Re: Having traindata files uncombined >>> To: [email protected] >>> >>> >>> Yeah - it is much better ;-) >>> Unfortunately at the moment I do not have time for deep testing so here >>> are my suggestions: >>> >>> - if you are using tesseract via api, try to set rectangles (instead >>> of whole image) with coords of city names to avoid "noise" (e.g. >>> contours) >>> from map. tesseract is "noise sensitive" and noise can decrease ocr >>> quality >>> - if you are using tesseract executable try to extract city names to >>> individual images >>> - after this you can start to play with dictionaries ;-) >>> - you can use user_words "outside" of traineddata file see [1] >>> - try to play with page segmentation parameter (psm) >>> - I am not aware how to increase (or decrease) strength of >>> dictionaries in tesseract 3.02 (e.g. to force tesseract to output only >>> words from dictionaries...) >>> >>> I believe after this you can at least evaluate if tesseract is suitable >>> for your task... >>> >>> [1] >>> http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data >>> >>> -- >>> Zdenko >>> >>> On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana < >>> [email protected]> wrote: >>> >>>> actually you can use this image under >>>> http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is >>>> high quality than above. >>>> >>>> >>>> On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]>wrote: >>>> >>>>> >>>>> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < >>>>> [email protected]> wrote: >>>>> >>>>>> Image that I'm trying to identify is attached. Most words in here are >>>>>> not identified correctly. I added these words to user words and combined. >>>>>> But still didn't get the expected output. >>>>>> >>>>>> >>>>> your attached image has insufficient quality - I get no output for >>>>> it... >>>>> >>>>> -- >>>>> Zdenko >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> To unsubscribe from this group, send email to >>>>> [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> >>> >>> -- >>> Chathuri Gunawardhana >>> Undergraduate at University of Moratuwa >>> Sri Lanka >>> >> >> >> >> -- >> Chathuri Gunawardhana >> Undergraduate at University of Moratuwa >> Sri Lanka >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- Chathuri Gunawardhana Undergraduate at University of Moratuwa Sri Lanka -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

