As it is not properly possible to combine my traineddata from scratch with
an existing one, I have decided to also train my traineddata model numbers.
Therefore I wrote a script which synthetically generates groundtruth data
with text2image.
This script uses dozens of different fonts and create
The files will be at Google. You have to wait till Ray Smith updates the
repository.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Aug 22, 2017 at 12:58 PM, wrote:
> Thanks for your reply.
>
> Do you know where
Thanks for your reply.
Do you know where can I find the new langdata files?
在 2017年8月22日星期二 UTC+8下午3:22:36,shree写道:
>
> The langdata files have not been updated for 4.00alpha
>
> ShreeDevi
>
> भजन - कीर्तन - आरती @ http://bhajans.rampar
The langdata files have not been updated for 4.00alpha
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Aug 22, 2017 at 12:17 PM, wrote:
> Hello,
>
> I'm trying to re-train the chi_sim.traineddata model from scrat
Hello,
I'm trying to re-train the chi_sim.traineddata model from scratch for
studying.
I use the source data of chi_sim.training_text in the link directory
https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train the
model with the command:
training/lstmtraining --debug_interval
Hello,
I'm trying to re-train the chi_sim.traineddata model from scratch for
studying.
I use the source data of chi_sim.training_text in the link directory
https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train the
model with the command:
training/lstmtraining --debug_interval
also see
https://github.com/tesseract-ocr/tesseract/blob/master/contrib/genlangdata.pl
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, May 20, 2017 at 10:12 AM, ShreeDevi Kumar
wrote:
> Google has not shared its
Google has not shared its method of training with complete scripts etc. The
training instructions on wiki are only a tutorial for learning about LSTM
training.
Please also see https://github.com/tesseract-ocr/tesseract/issues/644
ShreeDevi
--
You received this message because you are subscribe
I have already been going through language-specific.sh but I still have a
few questions I hope someone can answer.
My initial question I guess is where there other tools used to create the
training data for the English model that is currently provided? (other than
the ones provided on git?) ie.
As per Ray 4500 fonts and 40 lines of text were used to create the
models of latin scriipt based languages. So I am not sure whether you can
replicate the model.
For language specific exposure settings etc see
https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.s
If trainin tesseract 4 from scratch, English for example. I know I need to have
the proper fonts installed, but what other parameters would be needed to
produce the same model for English? Ie what exposure settings were used to
degrade images etc?
--
You received this message because you are s
11 matches
Mail list logo