Please attach a copy of the image so that I can try.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 11, 2014 at 9:43 PM, <[email protected]> wrote:

> I was in PSM_SINGLE_LINE mode indeed, because my text is already segmented
> into lines, and changing to PSM_AUTO does help with the I-I issue, but I
> have to say that the overall quality is still better with PSM_SINGLE_LINE.
> With PSM_AUTO I start getting all kinds of punctuation and other errors. I
> also tried disabling chopping, leading to disastrous results. My glyphs are
> not guaranteed to not touch.
> I am still perplexed though how tesseract ends up preferring I-Iercules
> instead of Hercules, when Hercules is a dictionary word and the I-I -> H
> ambig rule is in place...
>
> On Thursday, November 6, 2014 7:48:00 PM UTC-5, [email protected] wrote:
>>
>> Hello,
>> I'm using tesseract 3.02 on Windows 7 and I started with the
>> eng.traineddata that was distributed with 3.02.
>> Tesseract keeps misreading some symbols, specifically 6 instead of G, I-I
>> instead of H and a few others, so I'm getting 6od instead of God,
>> I-Iercules instead of Hercules and so on. I was hoping that using the
>> dictionary would help with this so I wouldn't have to retrain, because
>> after all it's just these few symbols, but nothing seems to help. So far
>> I've tried:
>>
>> Cranking up the language_model_penalty_non_dict_word and
>> language_model_penalty_non_freq_dict_word values in the config file
>> Adding "load_system_dawg T" and "load_freq_dawg T" to the config file
>> (even though it's supposed to do that by default)
>> Adding the 6->G rule to unicharambigs (as "1 6 1 G 0") and recombining.
>> The I-I -> H rule was already there.
>> Adding the words God and Hercules to the frequent word list and
>> recombining (eng.freq-dawg).
>> Emptying both the word list (eng.word-dawg) and frequent word list
>> (eng.freq-dawg) and putting just these two words in and recombining, just
>> to see if it would make a difference. It didn't.
>>
>> Nothing I've done so far has helped, but it seems to me that the point of
>> using the dictionary is to deal with exactly this type of a situation, so I
>> feel like I must be missing something. Have I maybe missed a configuration
>> step?
>>
>> Thanks
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2906d195-bc75-4b68-ad97-49f69221d106%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2906d195-bc75-4b68-ad97-49f69221d106%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV%3DLY85yaYUCw3S%3DZHfKULd1DifTzr-FQTFrMz1f2%3D1Fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to