[tesseract-ocr] How to effeciently extend the training_text file?

peter bence Thu, 10 Oct 2019 00:16:24 -0700

I'm working with Arabic `langdata_lstm`, where it only has 84 lines of 
training text in the `training_text` file, where I believe it is too small 
for building/training a reliable model. After reading the `training_text` 
file I can see a randomly generated text with no meaning, first I think 
that this is an Arabic problem, but later I found that it is the same for 
all other languages.


*My questions are:*

1. What specifications are followed when generating these `training_text` 
files (I can see for example that each line is no more than 60 characters 
long, is this one of the specification?)

2. Could I simply extend the `training_text` file then generate my training 
data with custom fonts and start training directly? or there are other 
files that should be changed after changing this file? if yes, what are 
they and how to regenerate them?

Best Regards

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f40d972a-50d8-4a17-b69c-3f83271b3af8%40googlegroups.com.

[tesseract-ocr] How to effeciently extend the training_text file?

Reply via email to