[tesseract-ocr] Training from Scratch

2023-11-22 Thread Simon
As it is not properly possible to combine my traineddata from scratch with an existing one, I have decided to also train my traineddata model numbers. Therefore I wrote a script which synthetically generates groundtruth data with text2image. This script uses dozens of different fonts and create

Re: [tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-22 Thread ShreeDevi Kumar
The files will be at Google. You have to wait till Ray Smith updates the repository. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 22, 2017 at 12:58 PM, wrote: > Thanks for your reply. > > Do you know where

Re: [tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-22 Thread robertyoung0511
Thanks for your reply. Do you know where can I find the new langdata files? 在 2017年8月22日星期二 UTC+8下午3:22:36,shree写道: > > The langdata files have not been updated for 4.00alpha > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.rampar

Re: [tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-22 Thread ShreeDevi Kumar
The langdata files have not been updated for 4.00alpha ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 22, 2017 at 12:17 PM, wrote: > Hello, > > I'm trying to re-train the chi_sim.traineddata model from scrat

[tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-21 Thread robertyoung0511
Hello, I'm trying to re-train the chi_sim.traineddata model from scratch for studying. I use the source data of chi_sim.training_text in the link directory https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train the model with the command: training/lstmtraining --debug_interval

[tesseract-ocr] Training from Scratch for chi_sim.traineddata

2017-08-21 Thread robertyoung0511
Hello, I'm trying to re-train the chi_sim.traineddata model from scratch for studying. I use the source data of chi_sim.training_text in the link directory https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train the model with the command: training/lstmtraining --debug_interval

Re: [tesseract-ocr] Training from scratch

2017-05-20 Thread ShreeDevi Kumar
also see https://github.com/tesseract-ocr/tesseract/blob/master/contrib/genlangdata.pl ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, May 20, 2017 at 10:12 AM, ShreeDevi Kumar wrote: > Google has not shared its

Re: [tesseract-ocr] Training from scratch

2017-05-19 Thread ShreeDevi Kumar
Google has not shared its method of training with complete scripts etc. The training instructions on wiki are only a tutorial for learning about LSTM training. Please also see https://github.com/tesseract-ocr/tesseract/issues/644 ShreeDevi -- You received this message because you are subscribe

Re: [tesseract-ocr] Training from scratch

2017-05-19 Thread aggiedude
I have already been going through language-specific.sh but I still have a few questions I hope someone can answer. My initial question I guess is where there other tools used to create the training data for the English model that is currently provided? (other than the ones provided on git?) ie.

Re: [tesseract-ocr] Training from scratch

2017-05-19 Thread ShreeDevi Kumar
As per Ray 4500 fonts and 40 lines of text were used to create the models of latin scriipt based languages. So I am not sure whether you can replicate the model. For language specific exposure settings etc see https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.s

[tesseract-ocr] Training from scratch

2017-05-19 Thread aggiedude
If trainin tesseract 4 from scratch, English for example. I know I need to have the proper fonts installed, but what other parameters would be needed to produce the same model for English? Ie what exposure settings were used to degrade images etc? -- You received this message because you are s