HI, I am trying to give a string pattern into TesseractEngine object when
it is initiated.
I am using "A .Net wrapper for tesseract-ocr" 3.0.1.0 in C#.
Here is my code:
C# code
using( TesseractEngine engine = new TesseractEngine(
@"./tessdata",
"eng",
EngineMode.Default,
"bazzar" ) ) // here load config from bazzar *important*
{
engine.SetVariable( "tessedit_char_whitelist",
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-" );
engine.SetVariable( "language_model_penalty_non_freq_dict_word", "1" );
engine.SetVariable( "language_model_penalty_non_dict_word", "1" );
string user_patterns_suffix;
engine.TryGetStringVariable( "user_patterns_suffix", out
user_patterns_suffix );
using( Page page = engine.Process( bitmap, PageSegMode.SingleLine ) )
{
...
}
}
tessdata/configs/bazzar
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
user_patterns_suffix user-patterns
tessdata/eng.user-patterns
25\w\w\w\d\d
tessdata/eng.user-words
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
OCT
SEP
NOV
DEC
TestImage.jpg
25MAR16
Output from tesseract:
25HAR16
I have successfully inserted user-words and user-patterns into the
tesseract object.
But the tesseract doesn't seem to refer to my user-words list because it
keeps returning
HAR instead of MAR.
How can I force to read \w\w\w in the user-words list?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4da2ded6-89b6-42d3-857b-2f1529fcd195%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.