I had no problems training with the ocr-d boxes. Looking at the tiffs the
first thing I'd try to do is adding some white border on left and right.

For my training I used no-binarized (grayscale) data and I think it could
be better (more information is available).

Are you training from scratch of fine tuning a model? How many epochs did
you do? How long did it run? Maybe you just need to wait more.

Please, have a look at this thread too:

https://groups.google.com/forum/#!topic/tesseract-ocr/be4-rjvY2tQ


Bye

Lorenzo


2018-07-04 17:03 GMT+02:00 Joe <[email protected]>:

> I forgot to mention:
> The *.box files created by OCR-D are not in the same format as described
> in https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0
> I know Tesseract 4 boxes only need to cover a text line instead of
> individual chars, but in the example given in that link every character box
> value is different while in *.box files created by OCR-D the all have the
> same values.
>
> Is that a problem?
>
>
> quarta-feira, 4 de Julho de 2018 às 11:50:54 UTC-3, Joe escreveu:
>>
>> Hi everybody!
>>
>> I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without
>> success so far. Tesseract and Leptonica are installed by the scripts.
>> Inspired by the test set provided in that repo, I created pairs of
>> [*.tif, *.gt.txt] with binarized chars and TTF's from two fonts (1869 text
>> lines in total).
>> You can see an example of my set in attachment that also contains files
>> created by the training process.
>>
>> My guess is that something is wrong with my data.
>> Sometimes I can see the char train value increasing instead of decreasing
>> and the final error rate still too high (about 60%).
>>
>> That new training process with LSTM is driving me crazy!
>> I would appreciate if anyone with experience could take a look to my data
>> set.
>>
>>
>> Joe.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/601364b4-3ebd-4a04-9f6a-3d418ab728ab%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/601364b4-3ebd-4a04-9f6a-3d418ab728ab%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzhzvdg7SH_cvyC2s8kLM%3DKw%2BQCPxv7rBnEcGzmzGR_kg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to