Tesseract is OCR engine and the user is responsible for preprocessing - see the documentation. IMO there is already app (using tesseract) for what you try to do: Text Fairy [1]
[1] https://play.google.com/store/apps/details?id=com.renard.ocr&hl=en Zdenko st 31. 1. 2024 o 2:00 Borneq <[email protected]> napĂsal(a): > First I test tesseract on file generated as flat image. > I generate Lorem Ipsum text: > > 5 paragraphs, 452 words 2978 bytes, 24 lines + 4 blank lines, maximal line > len in my editor was 135 chars. > > Result: 100% accurate but two full stop marks, fantastic. > > Next, I rotate image. Only 0.7 degree caused a lot of confusion and minor > rotation 0.1-0.6 degree - treat some m as n. > > In my book photo images are often rotate up to 3.5 degree. > Worse, text is transformed into curve lines of text like F-distribution > > ("What function looks like the edge of a paper book sideways? on > math.stackexchange.com) > > how to work with real photos of books, it is possible as option or thing > that is missing in tesseract ? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/9ac3343e-df3c-432e-8066-af21a20eda1cn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/9ac3343e-df3c-432e-8066-af21a20eda1cn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wdJtDmAiBLstMRU2CVe_ZL2RiMeZH5wk%3DXFW-crK16yw%40mail.gmail.com.

