[tesseract-ocr] OCR-D training process - High error rate [Tess 4]

Joe Wed, 04 Jul 2018 07:51:02 -0700

Hi everybody!

I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without 
success so far. Tesseract and Leptonica are installed by the scripts.
Inspired by the test set provided in that repo, I created pairs of [*.tif, 
*.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in 
total).
You can see an example of my set in attachment that also contains files 
created by the training process.


My guess is that something is wrong with my data.
Sometimes I can see the char train value increasing instead of decreasing 
and the final error rate still too high (about 60%).

That new training process with LSTM is driving me crazy!
I would appreciate if anyone with experience could take a look to my data 
set.


Joe.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2bb5d250-19a7-48bc-bd51-ec430c9a8235%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<<attachment: data.zip>>

[tesseract-ocr] OCR-D training process - High error rate [Tess 4]

Reply via email to