Hi,
I am trying to train *Tesseract OCR 4.0 using images* instead of font.
I have used OCR-D to train the images. But after 10000 iterations error
rate remains to 100. When i increased iterations to 100000 (although
smaller iterations are preferred everywhere) error rate drops to 7.8% but
testing the model gives me poor results.
*I wanted to know whether this is the problem with my dataset or what else
i could change*.
*Does fine tuning OCR-D improves my result?* I also have followed this link
<https://groups.google.com/forum/#!searchin/tesseract-ocr/fine$20tuning$20english$20language%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ>
for fine tuning instructions.
To my understanding fine tuning means we are retraining a pretrained
model(eng) with our dataset right?
But running the following command with CONTINUE_FROM=eng and MODEL_NAME =
my_ocr_model, I get another error:
lstmtraining \
--continue_from $(TESSDATA)/$(CONTINUE_FROM).lstm \
--old_traineddata $(TESSDATA)/$(CONTINUE_FROM).traineddata \
--traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
--model_output data/checkpoints/$(MODEL_NAME) \
--debug_interval -1 \
--train_listfile data/list.train \
--eval_listfile data/list.eval \
--sequential_training \
--max_iterations 10000
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Makefile:118: recipe for target 'data/checkpoints/ocr_model_checkpoint'
failed
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/157a97d6-43bb-4035-82c3-9f65655f410a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.