[tesseract-ocr] Fine tuning with existing box/tiff pairs in Tesseract 4.0

an-an-kondratjeva Sat, 06 May 2017 03:00:07 -0700

Hello everyone,
I'm experimenting with handwriting recognition using Tesseract 4.0. More 
concrete, I want to train Tesseract to recognize one particular Russian 
handwriting.
So, I wanted to add the "new font" (based on a bunch of tiff-images, which 
are a part of scanned archive, and box files) to already existing 
rus.traineddata using fine tuning.
I've prepared tiff/box pairs and then tried this script:


training/lstmtraining --model_output /.../rus_new/ --continue_from 
> /.../rus.lstm --train_listfile /.../list_of_files.txt --eval_listfile 
> /.../list_of_files.txt --max_iterations 5000


Where "list_of_files.txt' looked like:

/.../rus.Eskal_Font4You.exp0.tif
> /.../rus.Eskal_Font4You.exp0.box


...and it ended up with this error:

> First document cannot be empty!!
> num_pages_per_doc_ > 0:Error:Assert failed:in file imagedata.cpp, line 655


What I am missing? 

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0ca72c72-aa9a-4412-89a2-5b03b0446a7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Fine tuning with existing box/tiff pairs in Tesseract 4.0

Reply via email to