After some work i am able to:
- Use the method *lstmbox* of *tesseract.exe* to obtain the *.box* files of
my *.tif* images
- Use the third party software *JTessBoxEditor* to correct the recognized
characters, leaving boxes all around the full line of text
- Use the method *lstm.train* of
I tried again this path not remembering where i got stuck, and after
following all the instructions and running *make training* the terminal is
stuck at the first step
*unicharset_extractor --output_unicharset "data/eng/unicharset" --norm_mode
2 "data/eng/all-gt"*
>From here it does nothing,
Thanks for the suggestion, I already tried this one but i will try again!
Il giorno mercoledì 15 gennaio 2020 15:29:57 UTC+1, shree ha scritto:
>
> Take a look at tesseract-ocr/tesstrain
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To
Galbhtha , MUP foe Manomadl) Cxclaomgle
> File eng.test.pro5.lstmf line 0 :
> Mean rms=3.708%, delta=18.921%, train=78.266%(95.833%), skip ratio=0%
> Iteration 4: GROUND TRUTH : nominating any more Labour life Peers
> Iteration 4: BEST OCR TEXT : wominading any wone Loabour Lfe. "
Yes, i was setting debug_interval -1, but in Windows it didn't show the
error after the iteration 0. Installing Ubuntu on a WSL and repeating all
the process it showed that the problem was *eng.traineddata *that i was
mistakingly using wasn't from tessdata_best, so the already seen *integer
t; better to use Linux.
>
> On Thu, Jan 16, 2020 at 6:30 PM 'Fabio Lugli' via tesseract-ocr <
> tesser...@googlegroups.com > wrote:
>
>> Thank you very much, now i can get to see them. But obviously, after one
>> simple step forward here is another wall:
&
Thank you very much, now i can get to see them. But obviously, after one
simple step forward here is another wall:
*Warning: LSTMTrainer deserialized an LSTMRecognizer!*
*Continuing from ./tessdata/unpacked/eng.lstm*
*Loaded 1/1 lines (1-1) of document eng.test.pro0.lstmf*
*Loaded 1/1 lines
Hello everyone, i'm trying to train tesseract on handwriting, knowing that
it's not the best option, using the latest version available for Windows. I
have access to a huge amount of .tif files, lines of handwritten text, i'm
able to obtain the .box files, which I later edit to be compliant to
After working a couple of days on my dataset, I have seen that the fine
tuned model on handwritten text gets better on some lines of text, but
worse on others, so i trained again and the results didn't change. Is it
normal that the model gets better on some text but worse on another over
each
I still get the error, but I understood it being how I write the *all-lstmf*
file,
from which lstmtraining can't get the images. Right now i write into it:
*[FULL PATH TO MY FILE]/eng.test.pro0.lstmf*
*[FULL PATH TO MY FILE]/eng.test.pro1.lstmf*
*[FULL PATH TO MY FILE]/eng.test.pro2.lstmf*
ecc.
10 matches
Mail list logo