Your files have prefix of jpn, so I assume you are training for Japanese, 
but the image in question has only numbers in it.

Getting good results on eval data but bad results on OCR could be the 
result of overfitting the model, if you have used a small sample and 
trained for large number of iterations.


On Friday, June 14, 2019 at 8:35:40 AM UTC+5:30, Phuc wrote:
>
> Hi
> I am training a model using Tesseract's lstmtraining and get confuse about 
> the result I get. I wonder if I do anything wrong among these steps below:
>
>    - I create training data .box and .tif following 
>    https://github.com/tesseract-ocr/tesseract/issues/2357. Note that an 
>    (.box, .tif) pair include multiple text lines
>    - Run the training process using https://github.com/OCR-D/ocrd-train. 
>    Since I already have .box file, I simply comment out the line of 
>    `generate_line_box.py` inside the Makefile
>    - After training, I use lstmeval to evaluate the model on some 
>    evaluation dataset and get the error which is not so bad
>
> [image: 図1.png]
>
>
>    - But when I use the exact same image on evaluation dataset, and run 
>    the prediction using .traineddata and then the result seems to be totally 
>    different
>
> I also attach some files of my training data and the visualized result in 
> case anyone wants to take a look
>
> I will be appreciate if someone can tell me what wrong did I do
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a6090eb0-6803-4242-b2e9-9cf27ca65126%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to