I have a set of English single-page TIFF document images that come with 
ground truth files. Each TIFF has a single rectangular zone of text and 
each GT file is a UTF8 text file containing the correct text.

I built T3.03 from the source and applied it to this set using whatever 
English model that came out of the box. Results were mixed and so the 
question I am trying to answer is this:

Can I incrementally train Tesseract using a part of this corpus to get 
better accuracy? 

I've been reading 
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but it's 
unclear to me if incremental training is possible. Is it? How would I have 
to modify the training procedure to include previosuly trained data in it 
to increment it with whatever comes from the new data?

Thx


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8a2bf1e9-3bac-46ba-a7c1-8cfe566b5873%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to