Hi,

I have searched a lot about --user-words option in the internet to know 
more about it, but unsuccessfully.

I'am treating a simple case with spanish trained data doing :

api/tesseract -l spa --psm 6 test.png output  tessdata/configs/unlv;

I expect the following output from my image :

numero de documento

but instead, i'm getting 

mumsro ne odcumento

It's a little bit frustrating because that words don't exist in spanish. So 
i define a spa.user-words file like :

documento
numero

and run the following command line :

api/tesseract --user-words spa.user_words -l spa --psm 6 test.png output 
 tessdata/configs/unlv;


But i still got bad ocr 

numero de documento

Am I using --user-words option in the right way? Can i get wanted results 
using this option?
Many thanks

PS : I have also uncombine spa.traineddata , add 'documento' and 'numero' 
to the spa.freq-dawg, and recombine but without any improvment.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/17ea03e9-60a1-401b-969e-3e9eac13dbf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to