Re: [tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

Lorenzo Bolzani Sun, 08 Jul 2018 04:28:42 -0700

About the white border, maybe my suggestion was not so good.

I've seen that sometimes adding some generous white border during
recognition helps a lot (both with characters recognition and characters
splitting).


But I'm also seeing that training with a border and doing recognition with
a different sized one gives a lot of errors.

I suppose that the white border may somehow compensate for a mismatch
between the real data and the training data (or creating it).

So it's probably better to train with a very small border (or none?),
anyway use the same you will use with your real data (or do a little
"border augmentation", like 1px or 2px).


Bye

Lorenzo

2018-07-07 18:41 GMT+02:00 Lorenzo Bolzani <[email protected]>:

>
> I never had this. It's strange that you are getting this now and not
> during the training.
>
> I would check the location I'm running the command from, I mean, that
> data/train/...lstmf is there, in the correct relative place.
>
>
> Second I would check the lstmf file size. Then I would inspect the tiff
> and gt.txt files the lstmf was generated from to see if they are empty,
> missing, wrong, etc.
>
>
> When I have these doubts I delete the box, lstmf, etc., and let ocr-d
> recreate everything.
>
>
> Or maybe there is something wrong with the training data, this is another
> possible reason for training improving for a while and then get stuck.
>
>
> Lorenzo
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwE9b7yi6KSU%2B%3DA5SaX6FAgxGJnQy10h92KfGq7h3_PwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

Reply via email to