[tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-15 Thread 'Fabio Lugli' via tesseract-ocr
After some work i am able to: - Use the method *lstmbox* of *tesseract.exe* to obtain the *.box* files of my *.tif* images - Use the third party software *JTessBoxEditor* to correct the recognized characters, leaving boxes all around the full line of text - Use the method *lstm.train* of

Re: [tesseract-ocr] Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-15 Thread 'Fabio Lugli' via tesseract-ocr
I tried again this path not remembering where i got stuck, and after following all the instructions and running *make training* the terminal is stuck at the first step *unicharset_extractor --output_unicharset "data/eng/unicharset" --norm_mode 2 "data/eng/all-gt"* >From here it does nothing,

Re: [tesseract-ocr] Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-15 Thread 'Fabio Lugli' via tesseract-ocr
Thanks for the suggestion, I already tried this one but i will try again! Il giorno mercoledì 15 gennaio 2020 15:29:57 UTC+1, shree ha scritto: > > Take a look at tesseract-ocr/tesstrain > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To

Re: [tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-16 Thread 'Fabio Lugli' via tesseract-ocr
Galbhtha , MUP foe Manomadl) Cxclaomgle > File eng.test.pro5.lstmf line 0 : > Mean rms=3.708%, delta=18.921%, train=78.266%(95.833%), skip ratio=0% > Iteration 4: GROUND TRUTH : nominating any more Labour life Peers > Iteration 4: BEST OCR TEXT : wominading any wone Loabour Lfe. "

Re: [tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-16 Thread 'Fabio Lugli' via tesseract-ocr
Yes, i was setting debug_interval -1, but in Windows it didn't show the error after the iteration 0. Installing Ubuntu on a WSL and repeating all the process it showed that the problem was *eng.traineddata *that i was mistakingly using wasn't from tessdata_best, so the already seen *integer

Re: [tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-16 Thread 'Fabio Lugli' via tesseract-ocr
t; better to use Linux. > > On Thu, Jan 16, 2020 at 6:30 PM 'Fabio Lugli' via tesseract-ocr < > tesser...@googlegroups.com > wrote: > >> Thank you very much, now i can get to see them. But obviously, after one >> simple step forward here is another wall: &

Re: [tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-16 Thread 'Fabio Lugli' via tesseract-ocr
Thank you very much, now i can get to see them. But obviously, after one simple step forward here is another wall: *Warning: LSTMTrainer deserialized an LSTMRecognizer!* *Continuing from ./tessdata/unpacked/eng.lstm* *Loaded 1/1 lines (1-1) of document eng.test.pro0.lstmf* *Loaded 1/1 lines

[tesseract-ocr] Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-14 Thread 'Fabio Lugli' via tesseract-ocr
Hello everyone, i'm trying to train tesseract on handwriting, knowing that it's not the best option, using the latest version available for Windows. I have access to a huge amount of .tif files, lines of handwritten text, i'm able to obtain the .box files, which I later edit to be compliant to

[tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-20 Thread 'Fabio Lugli' via tesseract-ocr
After working a couple of days on my dataset, I have seen that the fine tuned model on handwritten text gets better on some lines of text, but worse on others, so i trained again and the results didn't change. Is it normal that the model gets better on some text but worse on another over each

Re: [tesseract-ocr] Re: Training Tesseract 5.0.0 to recognize digital handwriting

2020-01-16 Thread 'Fabio Lugli' via tesseract-ocr
I still get the error, but I understood it being how I write the *all-lstmf* file, from which lstmtraining can't get the images. Right now i write into it: *[FULL PATH TO MY FILE]/eng.test.pro0.lstmf* *[FULL PATH TO MY FILE]/eng.test.pro1.lstmf* *[FULL PATH TO MY FILE]/eng.test.pro2.lstmf* ecc.