[tesseract-ocr] Can user-words file have punctuation? Should it have or not?

Mark Mon, 23 Nov 2015 12:35:50 -0800

Hello,

I use Tesseract 3.04 on Ubuntu 12.04.


I know the words of all the papers I need to scan and I'm gonna put all of 
them in the user-words file but many of them are included in brackets "()" 
or have "-" and I also have words like these:

Anti-HAV
(Angoron)
AMH/MIS
Aντισ.Εναντι
(B-HCG)
C1
+CD56-NK
DHEA’S
HPL(
MTHFR-G20210-FV-LEIDEN
Resistance(VLEIDEN)
(TIB.C)
V(H1299R(R2))
β-FIBRINOGEN(-455G-A)
Pallidum)
ΤΟΞΟΠΛΑΣΜΑ(Τ.gondii)-ΑΝΙΧΝΕΥΣΗ
μgr/dl
Women"

Should I clean these words from punctuation or should I leave them like 
this? I am only gonna find these words with this exact punctuation.

Are all of the examples above legit to be put in the user-words file?

Also because I need to scan papers with English and Greek, I'm using 
parameter " -l eng+ell " so I also put greek words in my eng.user-words 
file. Is that ok?


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9c17c146-3273-4d13-aec1-2976c9a0b69e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Can user-words file have punctuation? Should it have or not?

Reply via email to