Hi everybody! I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without success so far. Tesseract and Leptonica are installed by the scripts. Inspired by the test set provided in that repo, I created pairs of [*.tif, *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in total). You can see an example of my set in attachment that also contains files created by the training process.
My guess is that something is wrong with my data. Sometimes I can see the char train value increasing instead of decreasing and the final error rate still too high (about 60%). That new training process with LSTM is driving me crazy! I would appreciate if anyone with experience could take a look to my data set. Joe. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2bb5d250-19a7-48bc-bd51-ec430c9a8235%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
<<attachment: data.zip>>

