Hello, I simply cannot find the answer to this seemingly simple simple question. I am trying to create a fresh *ground truth* for a highly limited set of fonts, for training *tesseract 4.x*
Using *text2image* I have rendered a large TIF-image and the corresponding BOX-file, from a 100-line-text-file, My understanding is that this large image is not suitable for training, and that I *must* break this down into single line images and txt files, to start training. Am I mistaken? Now I am trying to continue with the tools in the *tesseract-ocr/tesstrain* repo (to generate all those small images) But for example *generate_gt_from_box.py *outputs nothing. Nor can I see how any of the *Makefile* targets apply to my goal. Please help, thanks! _______________________________________________________________________________ I have searched for days, so I also really wonder *where* I could have found the answer to this myself. There are so many READMEs and resources all over the place, so I feel like I might be staring at the answer without realising it. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5155a583-b07d-4dde-a656-eb4d2fe3a67dn%40googlegroups.com.

