Yeah - it is much better ;-) Unfortunately at the moment I do not have time for deep testing so here are my suggestions:
- if you are using tesseract via api, try to set rectangles (instead of whole image) with coords of city names to avoid "noise" (e.g. contours) from map. tesseract is "noise sensitive" and noise can decrease ocr quality - if you are using tesseract executable try to extract city names to individual images - after this you can start to play with dictionaries ;-) - you can use user_words "outside" of traineddata file see [1] - try to play with page segmentation parameter (psm) - I am not aware how to increase (or decrease) strength of dictionaries in tesseract 3.02 (e.g. to force tesseract to output only words from dictionaries...) I believe after this you can at least evaluate if tesseract is suitable for your task... [1] http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data -- Zdenko On Sat, Aug 11, 2012 at 2:23 PM, Chathuri Gunawardhana < [email protected]> wrote: > actually you can use this image under > http://www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high > quality than above. > > > On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny <[email protected]> wrote: > >> >> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < >> [email protected]> wrote: >> >>> Image that I'm trying to identify is attached. Most words in here are >>> not identified correctly. I added these words to user words and combined. >>> But still didn't get the expected output. >>> >>> >> your attached image has insufficient quality - I get no output for it... >> >> -- >> Zdenko >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

