About the white border, maybe my suggestion was not so good. I've seen that sometimes adding some generous white border during recognition helps a lot (both with characters recognition and characters splitting).
But I'm also seeing that training with a border and doing recognition with a different sized one gives a lot of errors. I suppose that the white border may somehow compensate for a mismatch between the real data and the training data (or creating it). So it's probably better to train with a very small border (or none?), anyway use the same you will use with your real data (or do a little "border augmentation", like 1px or 2px). Bye Lorenzo 2018-07-07 18:41 GMT+02:00 Lorenzo Bolzani <[email protected]>: > > I never had this. It's strange that you are getting this now and not > during the training. > > I would check the location I'm running the command from, I mean, that > data/train/...lstmf is there, in the correct relative place. > > > Second I would check the lstmf file size. Then I would inspect the tiff > and gt.txt files the lstmf was generated from to see if they are empty, > missing, wrong, etc. > > > When I have these doubts I delete the box, lstmf, etc., and let ocr-d > recreate everything. > > > Or maybe there is something wrong with the training data, this is another > possible reason for training improving for a while and then get stuck. > > > Lorenzo > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwE9b7yi6KSU%2B%3DA5SaX6FAgxGJnQy10h92KfGq7h3_PwQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

