Hello,

I would like to train *Tesseract 4* to recognize certain scripts/languages 
based on real images rather than synthetic ones. Here are my questions:

1. Is there a tool, preferably cross-platform (Windows/Linux) GUI, that 
assists in creating .box file based on scanned images? How to get 
coordinates of textlines? etc...

2. Is there a youtube/video tutorial describing .tiff/.box files 
preparation based on real scans?

3. What provides better recognition - training on real images or training 
on synthetic images?

4. How many textlines of real scans do I need to get proper recognition?

Thank you very much!
ST

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b6e1f4e0-04ca-4963-9cc6-ddc325fa7c1an%40googlegroups.com.

Reply via email to