[tesseract-ocr] User-words with Tesseract 5

Natalia Zgirovskaya Mon, 23 Mar 2020 03:39:47 -0700

Hi all,

I have an issue with providing list of user word to tesseract. I use 
Windows 10.
Installed tesseract version:


>tesseract.exe -v
tesseract v5.0.0-alpha.20191030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 
4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

My test image:

[image: test.jpg] <about:invalid#zClosurez>
I have "eng.user-words" file in the directory with traindata files that 
contains:
B1adeb1ab1a


Config file "bazaar" as follow:
load_system_dawg     F 
load_freq_dawg       F 
user_words_file  path/to/eng.user-words 
user_words_suffix user-words 
language_model_penalty_non_freq_dict_word 1 
language_model_penalty_non_dict_word 1

Running this command
"C:\Program Files\Tesseract-OCR\tesseract.exe" test.jpg stdout -l eng bazaar
gives "Bladeblabla" instead of "B1adeb1ab1a"

As well as this command
"C:\Program Files\Tesseract-OCR\tesseract.exe" test.jpg stdout -l eng 
--user-words path/to/eng.user-words
gives "Bladeblabla" instead of "B1adeb1ab1a"



Where am I wrong?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8a9fc351-5bdb-4122-ab1c-bbb516e8e2d4%40googlegroups.com.

[tesseract-ocr] User-words with Tesseract 5

Reply via email to