[tesseract-ocr] Fine tuning ocr model - Poor detection results

Varun Sab Tue, 18 Sep 2018 22:19:46 -0700

Hi,
I am trying to train *Tesseract OCR 4.0 using images* instead of font.
I have used OCR-D to train the images. But after 10000 iterations error 
rate remains to 100. When i increased iterations to 100000 (although 
smaller iterations are preferred everywhere) error rate drops to 7.8% but 
testing the model gives me poor results.
*I wanted to know whether this is the problem with my dataset or what else 
i could change*.
*Does fine tuning OCR-D improves my result?* I also have followed this link 
<https://groups.google.com/forum/#!searchin/tesseract-ocr/fine$20tuning$20english$20language%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ>
 
for fine tuning instructions. 
To my understanding fine tuning means we are retraining a pretrained 
model(eng) with our dataset right?
But running the following command  with CONTINUE_FROM=eng and MODEL_NAME = 
my_ocr_model, I get another error:


 lstmtraining \
      --continue_from   $(TESSDATA)/$(CONTINUE_FROM).lstm \
      --old_traineddata $(TESSDATA)/$(CONTINUE_FROM).traineddata \
      --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
      --model_output data/checkpoints/$(MODEL_NAME) \
      --debug_interval -1 \
      --train_listfile data/list.train \
      --eval_listfile data/list.eval \
      --sequential_training \
      --max_iterations 10000

!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Makefile:118: recipe for target 'data/checkpoints/ocr_model_checkpoint' 
failed

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/157a97d6-43bb-4035-82c3-9f65655f410a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Fine tuning ocr model - Poor detection results

Reply via email to