After some work i am able to:
- Use the method *lstmbox* of *tesseract.exe* to obtain the *.box* files of 
my *.tif* images
- Use the third party software *JTessBoxEditor* to correct the recognized 
characters, leaving boxes all around the full line of text
- Use the method *lstm.train* of *tesseract.exe* to obtain the *.lstmf* files 
from the *.box* files

Now when i try to use *lstmtraining.exe, *using *eng*.*traineddata *as 
starter traineddata i obtain the error:

*Deserialize header failed: [myfile1].lstmf*
*Deserialize header failed: **[myfile2]**.lstmf*
*Deserialize header failed: **[myfile3]**.lstmf*
*Loaded 1/1 lines (1-1) of document **[myfile4]**.lstmf*
*Load of images failed!!*

>From this i can understand there is an error either in the process of 
creating *.lstmf* files or in the images themselves that i have selected. 
Any suggestion is well accepted.


Il giorno martedì 14 gennaio 2020 17:43:40 UTC+1, Fabio Lugli ha scritto:
>
> Hello everyone, i'm trying to train tesseract on handwriting, knowing that 
> it's not the best option, using the latest version available for Windows. I 
> have access to a huge amount of .tif files, lines of handwritten text, i'm 
> able to obtain the .box files, which I later edit to be compliant to the 
> latest requirements (boxes all over the line, spaces between words, tab at 
> the end). After that i did not understand how to improve eng.traineddata or 
> how to create an own .traineddata file, also following the instructions on 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00. 
> So which are the next passages to obtain a correct training dataset?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/185a0555-a41f-4158-ad7b-a16ff7006e86%40googlegroups.com.

Reply via email to