HI, I am trying to give a string pattern into TesseractEngine object when 
it is initiated.
I am using "A .Net wrapper for tesseract-ocr" 3.0.1.0 in C#.

Here is my code:


C# code

using( TesseractEngine engine = new TesseractEngine( 
    @"./tessdata", 
    "eng", 
    EngineMode.Default, 
    "bazzar" ) )   // here load config from bazzar *important*
{   
    engine.SetVariable( "tessedit_char_whitelist", 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-" );
    engine.SetVariable( "language_model_penalty_non_freq_dict_word", "1" );
    engine.SetVariable( "language_model_penalty_non_dict_word", "1" );

    string user_patterns_suffix;
    engine.TryGetStringVariable( "user_patterns_suffix", out 
user_patterns_suffix );
    using( Page page = engine.Process( bitmap, PageSegMode.SingleLine ) )
    {
        ...
    }
}


tessdata/configs/bazzar

load_system_dawg     F
load_freq_dawg       F
user_words_suffix    user-words
user_patterns_suffix user-patterns


tessdata/eng.user-patterns

25\w\w\w\d\d


tessdata/eng.user-words

JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
OCT
SEP
NOV
DEC


TestImage.jpg

25MAR16


Output from tesseract:

25HAR16

I have successfully inserted user-words and user-patterns into the 
tesseract object.
But the tesseract doesn't seem to refer to my user-words list because it 
keeps returning
HAR instead of MAR.
How can I force to read \w\w\w in the user-words list?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4da2ded6-89b6-42d3-857b-2f1529fcd195%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to