Try this: https://github.com/Sintun/PersonalHelperPrograms/blob/master/Tesseract/tess.cpp
Longer story: https://github.com/tesseract-ocr/tesseract/issues/1714 Zdenko st 1. 7. 2020 o 10:29 [email protected] <[email protected]> napĂsal(a): > I want to optimise tesseract 4 (lstm) for a set of documents I have. > I managed to improve its character recognition using the documentation in > https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00. > > However, some words are not just detected. usually words inside tables. > Even using --psm 6, some are missed. > > Is there a way to train the layout/segmentation/word detection engine and > not just the character recognition? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e210bfe2-563a-48a5-b0bc-5363c7269bcfn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e210bfe2-563a-48a5-b0bc-5363c7269bcfn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zA9coS_cjXLrpf6tT5Go7fEhihEBGMLhcKJkiDU0v9RQ%40mail.gmail.com.

