Please attach a copy of the image so that I can try. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 9:43 PM, <[email protected]> wrote: > I was in PSM_SINGLE_LINE mode indeed, because my text is already segmented > into lines, and changing to PSM_AUTO does help with the I-I issue, but I > have to say that the overall quality is still better with PSM_SINGLE_LINE. > With PSM_AUTO I start getting all kinds of punctuation and other errors. I > also tried disabling chopping, leading to disastrous results. My glyphs are > not guaranteed to not touch. > I am still perplexed though how tesseract ends up preferring I-Iercules > instead of Hercules, when Hercules is a dictionary word and the I-I -> H > ambig rule is in place... > > On Thursday, November 6, 2014 7:48:00 PM UTC-5, [email protected] wrote: >> >> Hello, >> I'm using tesseract 3.02 on Windows 7 and I started with the >> eng.traineddata that was distributed with 3.02. >> Tesseract keeps misreading some symbols, specifically 6 instead of G, I-I >> instead of H and a few others, so I'm getting 6od instead of God, >> I-Iercules instead of Hercules and so on. I was hoping that using the >> dictionary would help with this so I wouldn't have to retrain, because >> after all it's just these few symbols, but nothing seems to help. So far >> I've tried: >> >> Cranking up the language_model_penalty_non_dict_word and >> language_model_penalty_non_freq_dict_word values in the config file >> Adding "load_system_dawg T" and "load_freq_dawg T" to the config file >> (even though it's supposed to do that by default) >> Adding the 6->G rule to unicharambigs (as "1 6 1 G 0") and recombining. >> The I-I -> H rule was already there. >> Adding the words God and Hercules to the frequent word list and >> recombining (eng.freq-dawg). >> Emptying both the word list (eng.word-dawg) and frequent word list >> (eng.freq-dawg) and putting just these two words in and recombining, just >> to see if it would make a difference. It didn't. >> >> Nothing I've done so far has helped, but it seems to me that the point of >> using the dictionary is to deal with exactly this type of a situation, so I >> feel like I must be missing something. Have I maybe missed a configuration >> step? >> >> Thanks >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2906d195-bc75-4b68-ad97-49f69221d106%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2906d195-bc75-4b68-ad97-49f69221d106%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV%3DLY85yaYUCw3S%3DZHfKULd1DifTzr-FQTFrMz1f2%3D1Fw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

