[tesseract-ocr] Render Ground Truth from Scratch for Training

Adam Fri, 20 Oct 2023 08:23:10 -0700

Hello, I  simply cannot find the answer to this seemingly simple simple 
question. I am trying to create a fresh *ground truth* for a highly limited 
set of fonts, for training *tesseract 4.x*


Using  *text2image* I have  rendered a large TIF-image and the 
corresponding BOX-file,  from a 100-line-text-file, 

My understanding is that this large image is not suitable for training, and 
that I *must* break this down into single line images and txt files, to 
start training. Am I mistaken?

Now I am trying to continue with the tools in the *tesseract-ocr/tesstrain* 
repo (to generate all those small images) But for example  
*generate_gt_from_box.py 
*outputs nothing. Nor can I see how any of the *Makefile* targets apply to 
my goal. 


Please help, thanks!
_______________________________________________________________________________
I have searched for days, so I also really wonder *where* I could have 
found the answer to this myself. There are so many READMEs and resources  
all over the place, so I feel like I might be staring at the answer without 
realising it.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5155a583-b07d-4dde-a656-eb4d2fe3a67dn%40googlegroups.com.

[tesseract-ocr] Render Ground Truth from Scratch for Training

Reply via email to