Re: [tesseract-ocr] Re: lstmeval shows good result but visualized result looks bad

[email protected] Mon, 17 Jun 2019 05:38:05 -0700

Thanks shree for your reply. I see that you are very busy to answer a lot
of questions here. Thanks again for taking some time for me
>
> Your files have prefix of jpn, so I assume you are training for Japanese,
> but the image in question has only numbers in it.
>
Well I forgot to mention, my model only need to recognize digits, not all
of Japanese Character. I just put the prefix of jpn because I am working
with Japanese Document
Anw, as your answer I understand that high chance that I am dealing with
overfitting problem, not some problem of how to convert check point file to
.traineddata file, am I right? If so, I guess the first thing I should try
is to finetune your digits model (I found you shared on github
https://github.com/Shreeshrii/tessdata_shreetest). Correct me if I am wrong


Btw, I have 2 more questions:
1. About how I generate the training data. Since I could not find the right
font for my document, I cropped the digit image from the data I have and
randomly pick cropped digit to generate training image. Do you think this
is the right way to do the data augmentation?
2. I generated 2000 samples for the training, is it enough or not?

On Mon, Jun 17, 2019 at 5:19 PM shree <[email protected]> wrote:

> Your files have prefix of jpn, so I assume you are training for Japanese,
> but the image in question has only numbers in it.
>
> Getting good results on eval data but bad results on OCR could be the
> result of overfitting the model, if you have used a small sample and
> trained for large number of iterations.
>
>
> On Friday, June 14, 2019 at 8:35:40 AM UTC+5:30, Phuc wrote:
>>
>> Hi
>> I am training a model using Tesseract's lstmtraining and get confuse
>> about the result I get. I wonder if I do anything wrong among these steps
>> below:
>>
>>    - I create training data .box and .tif following
>>    https://github.com/tesseract-ocr/tesseract/issues/2357. Note that an
>>    (.box, .tif) pair include multiple text lines
>>    - Run the training process using https://github.com/OCR-D/ocrd-train.
>>    Since I already have .box file, I simply comment out the line of
>>    `generate_line_box.py` inside the Makefile
>>    - After training, I use lstmeval to evaluate the model on some
>>    evaluation dataset and get the error which is not so bad
>>
>> [image: 図1.png]
>>
>>
>>    - But when I use the exact same image on evaluation dataset, and run
>>    the prediction using .traineddata and then the result seems to be totally
>>    different
>>
>> I also attach some files of my training data and the visualized result in
>> case anyone wants to take a look
>>
>> I will be appreciate if someone can tell me what wrong did I do
>>
>> Thanks
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a6090eb0-6803-4242-b2e9-9cf27ca65126%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a6090eb0-6803-4242-b2e9-9cf27ca65126%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CACgP0BYPrmMgp6HLKBf4P8oQ7naACaZO0914%3DUQJKi4CzTKn0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: lstmeval shows good result but visualized result looks bad

Reply via email to