[tesseract-ocr] Can it be overtrained?

Jerry Deng Mon, 16 Oct 2017 10:47:33 -0700

Newbie here, any help is appreciated.  I'm using some handwriting data to 
Fine Tune train the english language model that I extracted from 
eng.traineddata file.  Prepared with box file and lstmf file.  It works 
with small test.  When I actually run it on 1500+ lstmf file, it works fine 
when I keep the max_iteration to be under 2200 or so.  As soon as I went 
over some threshold amount, the model suddenly became unusable and and spit 
out only CAPITAL letters with some odd punctuations (and the error rate 
shoot over 100%).  One time it even failed due to a segmentation fault.  
Does it sound like it's running out of memory or what are the possible 
causes?


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/84abde41-209b-493b-8c07-d6d9ea9fb33a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Can it be overtrained?

Reply via email to