Hi everyone, I wanted to do fine-tune the ben.traineddata model by using some ancient text that were supposedly printed with typeset. I have roughly around 1k lines of text and tried the normal fine-tuning approach with around 25k iterations. The thing that surprised me the most was even after packing the traineddata (character error was around 4%) and testing an unseen image, the performance was exactly the same. Not a single character was different! You can find the traineddata, training data, the logs and the source code at this link: https://github.com/srdg/unarchived_ben_tess/releases/tag/v0.0.4-alpha
Can anyone tell me exactly what I am doing wrong here? Do I need to change any training parameter, increase my training data, or anything else completely? Best regards, Soumik -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1fc044d1-b0ae-45d5-9041-e6fbf8ec5089n%40googlegroups.com.

