Hello everybody,
I just finished fine tuning according to Ray's tutorial.
I did the following steps:
1. I used tesstrain.sh to create training data and the starter
traineddata. The training data consists of the eng.training_text with the
multiple times added ± character.
2. I used combine_tessdata to extract the eng.lstm from the best
eng.traineddata
3. I used lstmtraining with the extracted eng.lstm, the starter
traineddata from step1 to train the model
This is the end of training:
*At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER
train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020
wrote checkpoint. Finished! Selected model with minimal training error rate
(BCER) = 0.017 *
4. Then I made a Screenshot of a textline with the same Font I created
the training data with and ran tesseract with the finished traineddata.
(also the text is 1:1 in the training daa
This is the text in the image
*New Articles page ± 23 a To Service ~~ a details DC that don't *
This is the result with the freshly trained model:
*Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt *
When I use the best eng.traineddata model I get this output:
*New Articles page = 23 a To Service ~~ a details DC that don't*
Can someone explain why I get such a bad result? The training seems fine. I
don't get any error messages. Everything I get back from my "fine tuned"
model is absolute crap and way worse than the original one.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/a0896f38-6190-4e29-8cd9-44713e6ccd1en%40googlegroups.com.