Hello, I use Tesseract 3.04 on Ubuntu 12.04.
I know the words of all the papers I need to scan and I'm gonna put all of them in the user-words file but many of them are included in brackets "()" or have "-" and I also have words like these: Anti-HAV (Angoron) AMH/MIS Aντισ.Εναντι (B-HCG) C1 +CD56-NK DHEA’S HPL( MTHFR-G20210-FV-LEIDEN Resistance(VLEIDEN) (TIB.C) V(H1299R(R2)) β-FIBRINOGEN(-455G-A) Pallidum) ΤΟΞΟΠΛΑΣΜΑ(Τ.gondii)-ΑΝΙΧΝΕΥΣΗ μgr/dl Women" Should I clean these words from punctuation or should I leave them like this? I am only gonna find these words with this exact punctuation. Are all of the examples above legit to be put in the user-words file? Also because I need to scan papers with English and Greek, I'm using parameter " -l eng+ell " so I also put greek words in my eng.user-words file. Is that ok? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9c17c146-3273-4d13-aec1-2976c9a0b69e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

